Retrieving the last record in each group

This is what I had been looking for a long time now. I have a table where I accumulate analytical data about customers’ usage like storage, bandwidth, licensed users, etc. I get a daily report for each of the server. My goal was to get the most recent data for any customer/server and so I found a way to do that. Thanks to this post. In spite of bookmarking this link, I decided to make notes of how the query for the LEFT JOIN works to get the last set of data for a customer.

Here’s the query I would use to get the latest row for each hostname:

SELECT u.*
FROM usage_data u LEFT JOIN usage_data ut
ON (u.hostname = ut.hostname AND u.id < ut.id)
WHERE ut.id IS NULL
ORDER BY u.hostname

Let’s assume we have the following set of data for a customer/hostname – usage_data(id, hostname, report_date, storage, bandwidth, licensed_users ….).

id | hostname            | report_date | .....
1  | customer.domain.com | 03/15/2015  | .....
2  | customer.domain.com | 03/16/2015  | .....
3  | customer.domain.com | 03/17/2015  | .....

When I do a simple left join, ON (u.hostname = ut.hostname), I’ll see resultant data with following id mappings:
1-3
1-2
1-1
2-3
2-2
2-1
3-3
3-2
3-1

But when I add another condition, the join looks like ON (u.hostname = ut.hostname AND u.id < ut.id), I’m only mapping rows for which u.id < ut.id. Therefore, you get the following mappings for id in [1, 2]
1-3
1-2
2-3
Now, for the id=3 in table u, there isn’t a row in ut that satisfies the condition u.id < ut.id. But since we’re using a LEFT JOIN, the database engine ensures that there is an entry for each row on the LEFT table. Therefore, the engine returns row for id=3 in table u and replaces by NULL for each value in table ut so our result set looks like the following
1-3
1-2
2-3
3-NULL

That’s what we wanted. A way to identify a row uniquely that is the latest. Now if we add another WHERE clause in the result set, to get only that data that has ut.id as NULL, we will get the row corresponding to the latest id value in table u. Therefore, the LEFT JOIN query makes a little more sense to me now and hopefully to you as well.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s