The Problem
In this problem, we need to calculate the total score of each hacker, where the total score is defined as the sum of their maximum scores across all challenges. We then need to print the hacker_id, name, and total score of the hackers, sorted by descending total score and ascending hacker_id for ties. We need to exclude all hackers with a total score of 0.
The Input
The input consists of two tables:
Hackers Table: The hacker_id is the id of the hacker, and name is the name of the hacker.
Submissions Table: The submission_id is the id of the submission, hacker_id is the id of the hacker who made the submission, challenge_id is the id of the challenge for which the submission belongs to, and score is the score of the submission.
Sample input and output are available for more in-depth understanding of the problem.
Sample Input
Hackers Table:
Submissions Table:
The Output
Sample Output:
4071 Rose 191 74842 Lisa 174 84072 Bonnie 100 4806 Angela 89 26071 Frank 85 80305 Kimberly 67 49438 Patrick 43
Explanation
Hacker 4071 submitted solutions for challenges 19797 and 49593, so the total score =95+max(43,96)=191.
Hacker 74842 submitted solutions for challenges 19797 and 63132, so the total score =max(98,5)+76=174.
Hacker 84072 submitted solutions for challenges 49593 and 63132, so the total score =100+0=100.
The total scores for hackers 4806, 26071, 80305, and 49438 can be similarly calculated.
The Solution
We'll discuss three SQL solutions, each with different strategies and trade-offs.
Source Code 1
The first source code creates a total_score
Common Table Expression (CTE) that joins the Hackers and Submissions tables and calculates the maximum score per challenge for each hacker. It then sums these max scores per hacker and filters out hackers with a total score of 0. It orders the result by total score in descending order and hacker_id in ascending order for ties. This solution uses the OVER
clause with PARTITION BY
to calculate the max score per challenge per hacker, which simplifies the subsequent aggregation.
WITH total_score AS ( SELECT DISTINCT h.hacker_id, h.name, s.challenge_id, MAX(s.score) OVER (PARTITION BY h.hacker_id, s.challenge_id) AS max_score_per_subm FROM Hackers h JOIN Submissions s ON h.hacker_id = s.hacker_id ) SELECT hacker_id, name, SUM(max_score_per_subm) AS total FROM total_score GROUP BY hacker_id, name HAVING SUM(max_score_per_subm) > 0 ORDER BY total DESC, hacker_id
Source Code 2
The second solution differs in that it first calculates the max score per challenge per hacker using a GROUP BY
clause in the total_score
CTE, then sums these max scores per hacker in the score_per_hacker
CTE. This involves two separate groupings, which may increase execution time. However, it also separates concerns and can be easier to understand.
WITH total_score AS ( SELECT s.hacker_id, MAX(s.score) AS max_score FROM Submissions s GROUP BY s.hacker_id, s.challenge_id ), score_per_hacker AS ( SELECT ts.hacker_id, SUM(ts.max_score) AS total_score FROM total_score ts GROUP BY ts.hacker_id ) SELECT sp.hacker_id, h.name, sp.total_score FROM score_per_hacker sp JOIN Hackers h ON sp.hacker_id = h.hacker_id WHERE sp.total_score > 0 ORDER BY sp.total_score DESC, sp.hacker_id
Source Code 3
The third solution uses the ROW_NUMBER()
function to assign a unique rank to each submission by each hacker for each challenge. It then only includes the highest-ranked (i.e., highest-scoring) submission for each challenge in the total score. This strategy avoids the need to use DISTINCT
in the CTE or to group by challenge_id, potentially improving performance.
SELECT hacker_id, name, SUM(CASE WHEN rn = 1 THEN score ELSE 0 END) AS total_score FROM ( SELECT h.hacker_id, h.name, s.score, ROW_NUMBER() OVER(PARTITION BY s.hacker_id, s.challenge_id ORDER BY s.score DESC) as rn FROM Hackers h JOIN Submissions s ON h.hacker_id = s.hacker_id ) t GROUP BY hacker_id, name HAVING SUM(CASE WHEN rn = 1 THEN score ELSE 0 END) > 0 ORDER BY total_score DESC, hacker_id
Conclusion
All three solutions achieve the same result but use different techniques to calculate the total score of each hacker. The first solution is straightforward but might not be as efficient due to its use of DISTINCT
. The second solution separates concerns by creating a CTE for each step, which could improve readability at the cost of execution time. The third solution is likely the most efficient due to its use of the ROW_NUMBER()
function to select the highest score per challenge per hacker directly.
In summary, while different SQL queries can achieve the same result, their performance can vary significantly based on the SQL features and functions used. Therefore, it's crucial to consider different approaches and understand their trade-offs.
You can find the original problem at HackerRank.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)