Web-based System For Querying Top 100 Hit Music Songs

Project Requirements

The project will involve the creation of a web based system that will connect to a noSQL database system and allow the user to query data. This data is about songs and contains information about the artists, various characteristics of the songs such as years, valence, danceability, tempo, length etc. The results will be displayed on a web interface built using HTML and CSS.

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

Choice of the language

The system will be created using PHP scripting language for the following reasons:

PHP is among the easiest language to learn and use. It has the C syntax and therefore anybody with a background in a C-like language can be able to use it.

PHP can run on both Windows and Unix and any other operating system. Therefore, the system will be guaranteed to run on almost any system without the need for modification

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

PHP is open source and therefore no licenses are required. This will bring the project cost down. Also, hosting a PHP web application is cheap and there are many hosting companies.  It is even possible to host it for free.

Ease of Change

NoSQL database is the appropriate Database management system for this project. The reason for this is that it is flexible enough to accommodate future changes. For example, more information about the sources could be added in future. A relational database management system is not appropriate since it requires the data schema to be defined from the beginning. On the contrary, noSQL databases make it easy to scale out through sharding (Rouse,  2011).

Indexing

NoSQL DBMSs allow us to create indexes that can allow us to search data without the complex joins required by RDBMSs thus increasing the speed of performing queries. For example, we can create an index on the songs that we can use to get all information on both songs and artists without querying both entities.

Reliance on SQL for performing queries leads to limitations on the type of queries that can be performed against a data source. This is because complex joins are required to perform any useful queries. The ability of noSQL DBMSs to retrieve data without the need for complex joins makes the noSQL DBMS more scalable than RDBMSs (MongoDB Inc., 2018). Therefore, in this project we are not limited by SQL

The system architecture will include a web frontend user interface built using HTML and CSS. The Backend will be developed using PHP while the database will be a noSQL database called MongoDB.

Database Management System

The user will submit queries through the web frontend. The PHP backend processing will then submit the queries to the database. The results will then be processed by PHP code and displayed on the web user interface. The following diagram shows the system architecture diagram.

Figure 1. System Diagram

Database Analysis and Design

The songs’ data is mainly composed of numeric values of various types.  The data in its current structure does not contain any unique column that can be used to identify individual records. Therefore, it needs to be restructured and some normalization applied. For the various columns, the values differ by a large margin. For example, the values in the duration column has values as large as 400 while the values in the instrumentalness has values as small as 0.000001. The song_title would be expected to have unique values but as in real world, there are different artists with songs bearing the same titles.

The data contains many artists who can be identified by unique artist_id. Each artist can have many songs. Therefore, the relationship is one to many. Each song can have many source ids, while each source can have many songs. Therefore, the relationship is many to many. However, there is no information of about the sources except the source_id. It is expected that as the system evolves, this will be added.

During our database design, we will make the following assumptions:

Each song by a certain artist is has a unique title. As such, there is no artist with two songs bearing the same title. Since the artist_id is unique for every artist, we are going to use the title of the song and the artist_id to uniquely identify a particular song.

Although MongoDB does not enforce strict declaration of schema, we are going to create one for organization of our data. The data will be contained in a collection where we will have all the documents which will be inserted from the comma separated values file using PHP.  We will execute several queries against the database to retrieve results based several give criteria.

This project features two data entities that will be represented using the MongoDB schema. Each entity will be represented using a separate document.

Artists

An artist had a name, and Id. Using noSQL database, we can be able to represent additional data of an artist as a document within the artist document. As a result, this allows us to be able to get artist’s information quickly.

NoSQL Database

Artist Document

Below is a sample artist document.

{

_id:{artist_id:1},

artist_id: “ARTH9041187FB43E1F”,

artist_name: “Eminem”,

song_title: [“Without Me”,”Stan”, “The Real Slim Shady”,”Just Lose It”]

}

Figure 2. Songs schema diagram

Sample song document

{

_id:{artist_id:1,song_title:1,id:1},

song_title: “Lose Yourself”,

artist_id: “ARTH9041187FB43E1F”,

id: [“SOTITAT144C281345F”],

key:2,

energy:0.478829,

liveness: 0.82686,

tempo:85.7,

speechiness:0.201306,

sound_quality:0.001771,

instrumentalness:0.000026,

mode:1,

time_signature:4,

duration:265.74322,

loudness:-14.581,

valence:0.165595,

danceability:0.776651,

years:”2010s”

}

Figure 3. Artist schema diagram

We are going to normalize our data using both referencing and linked documents.  The reason for this choice of design pattern is to achieve the following.

The most important entity in our application is the songs. Each song contains information about an artist. Trying to normalize the application by linked document would involve creation of an artist document within each song document. While this approach allows us to access all information of an artist within the song, it would lead to repetition of an artist’s information within every song they have produced.

The data contains relationships hence normalization allows us to see how each entity is related with another. A good example is the song that contains many ids from the various sources while the same sources have other songs coming from them.

The use of normalization allows us to update one entity without affecting the other. It also becomes easy to query the related entity using the metadata contained in source entity.

References/relationships

Since our data is related, we must represent that relationship. The artist_id in the song document refers to the artist_id in artist. This is a form of manual referencing method (MongoDB Inc.). Since PHP does not have any support for DBRef method which is also an overkill in this project.

Figure 4. Relationship Diagram

From the above model, the following collection designs have been realized

Figure 5. Song’s Database model

Figure 6. Songs Collection design

Extending the System

The way I have designed the system has been with due consideration to future changes. For example, I have developed a general query format that can be allowed to query almost any query. You can be able to query any field, sort it either ascending or descending, give define the number of results through a parameter and even submit a criteria in form of an array. This allows the user to perform almost any query in addition to the ones that are already defined. The function prototype is shown below.

getSongsByField($field,$sort=”desc”,$limit,$where=array())

The function is on a separate class that can be extended and the function overridden with a different implementation. Additionally, more functions can be added and they can be called within any page by importing the class through the include construct like so:

include_once ‘MySongsClass.php’;

Conclusion

The choice of both the database and the programming language leads to a very strong combination for developing a system like this. The database design enable very quick querying of data while allowing ease of data manipulation. This leads to a very efficient system that is very user friendly and efficient. The system also allows future evolution through the use of modular code and which also brings the benefit of easy maintenance.

References

MongoDB Inc.. 2018. Advantages Of NoSQL. [online] Available at: https://www.mongodb.com/scale/advantages-of-nosql. [Accessed 25 March 2018].

MongoDB Inc.. 2018. Database References—MongoDB manual 3.6. [online] Available at: https://www.mongodb.com/scale/advantages-of-nosql. [Accessed 25 March 2018].

Rouse, M. (2011). Sharding. [online] TechTarget. Available at: https://searchcloudcomputing.techtarget.com/definition/sharding [Accessed 26 Mar. 2018].