Learning SQL BY Task Series (1) Movie Rating

1. Problem Description:

You are given serveral tables as follows:

  1. Table: Movies
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| title         | varchar |
+---------------+---------+

movie_id is the primary key for this table.
title is the name of the movie.
  1. Table: Users
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
+---------------+---------+
user_id is the primary key for this table.
  1. Table: Movie_Rating
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| user_id       | int     |
| rating        | int     |
| created_at    | date    |
+---------------+---------+

(movie_id, user_id) is the primary key for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user ' s review date. 

Your task is to write the Query Scripts to complete two subtasks:

  • Find the name of the user who has rated the greatest number of the movies.
    In case of a tie, return lexicographically smaller user name.

  • Find the movie name with the highest average rating in February 2020.
    In case of a tie, return lexicographically smaller movie name.

Example


Movies table:
+-------------+--------------+
| movie_id    |  title       |
+-------------+--------------+
| 1           | Avengers     |
| 2           | Frozen 2     |
| 3           | Joker        |
+-------------+--------------+

Users table:
+-------------+--------------+
| user_id     |  name        |
+-------------+--------------+
| 1           | Daniel       |
| 2           | Monica       |
| 3           | Maria        |
| 4           | James        |
+-------------+--------------+

Movie_Rating table:
+-------------+--------------+--------------+-------------+
| movie_id    | user_id      | rating       | created_at  |
+-------------+--------------+--------------+-------------+
| 1           | 1            | 3            | 2020-01-12  |
| 1           | 2            | 4            | 2020-02-11  |
| 1           | 3            | 2            | 2020-02-12  |
| 1           | 4            | 1            | 2020-01-01  |
| 2           | 1            | 5            | 2020-02-17  | 
| 2           | 2            | 2            | 2020-02-01  | 
| 2           | 3            | 2            | 2020-03-01  |
| 3           | 1            | 3            | 2020-02-22  | 
| 3           | 2            | 4            | 2020-02-25  | 
+-------------+--------------+--------------+-------------+

Result table:
+--------------+
| results      |
+--------------+
| Daniel       |
| Frozen 2     |
+--------------+

Daniel and Maria have rated 3 movies (“Avengers”, “Frozen 2” and “Joker”) but Daniel is smaller lexicographically.
Frozen 2 and Joker have a rating average of 3.5 in February but Frozen 2 is smaller lexicographically.

Prepared SQL KnowLedge

  • Since we need to write two indepent qeury and union each result to get the result table. In SQL, There are two union operators,which are
    UNION and UNION ALL, They all combines the result set of two or more SELECT statements. However, UNION ALL allows duplicates rows, while UNION will only contains distinct values. Reference

  • In this subtask 1, we should first sort by the number of ratings and if any two are the same, we should sort by its user name lexicographically. So, In order to sort in this way, Order cause should be written as follows:


  # ORDER BY column1 DESC, column2
  This sorts everything by column1 (descending) first, 
  and then by column2 (ascending, which is the default) 
  whenever the column1 fields for two or more rows are equal.

Solutions:

  1. For the first task, we could GROUP BY the user_id, and COUNT the number of movies (represented by movie_id) which the user rated. Sort by the number, and its lexicographically smaller user name. USE LIMIT to get the first record.

Query script for your reference.

SELECT u.name, COUNT(m.movie_id) as max_rating_num
FROM Movie_Rating AS m
LEFT JOIN Users AS u
ON m.user_id = u.user_id
GROUP BY m.user_id
ORDER BY max_rating_num DESC,u.name
LIMIT 1 
  1. For the second task, its quite similar to the first query. Notice that Since what we really concern about is the ratings in February 2020. So Before we group the rows and compute the AVE, we should filter the dta in February 2020 by WHERECAUSE. Still, the moving_rating table is the primary table, so we use the left join with the movie table to get its name.

Query script for your reference.

SELECT m.title, AVG(r.rating) as movie_rating  
FROM Movie_Rating AS r
LEFT JOIN Movies AS m
ON m.movie_id = r.movie_id
WHERE MONTH(r.created_at) = 2 AND YEAR(r.created_at) = 2020
GROUP BY r.movie_id
ORDER BY movie_rating DESC, m.title
LIMIT 1

Final Answer:

Combination with UNION keyword.


(SELECT u.name AS results
FROM Movie_Rating AS m
LEFT JOIN Users AS u
ON m.user_id = u.user_id
GROUP BY m.user_id
ORDER BY COUNT(m.movie_id) DESC,u.name
LIMIT 1)

UNION

(SELECT m.title AS results
FROM Movie_Rating AS r
LEFT JOIN Movies AS m
ON m.movie_id = r.movie_id
WHERE MONTH(r.created_at) = 2 AND YEAR(r.created_at) = 2020
GROUP BY r.movie_id
ORDER BY AVG(r.rating) DESC, m.title
LIMIT 1)

Author: Liang Tan
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source Liang Tan !
 Previous
Create time-series spatial databases  With TimeScaleDB and PostGIS Create time-series spatial databases With TimeScaleDB and PostGIS
1. Introduction:This post will go through some basic concepts and examples. All the experiments are running on the Ubant
2020-03-17 Liang Tan
Next 
Learning Netty from scratch - 1. Blocking, Non-Blocking && Synchronous, Asynchronous Learning Netty from scratch - 1. Blocking, Non-Blocking && Synchronous, Asynchronous
Let’s try to understand these 4 concept by an example. Suppose, You want to boil some water. Suppose you are Process A,
2019-08-10 Liang Tan
  TOC