Run queries like count, distinct, etc from a list of CSV, Excel column on a DB

Someone recently sent me an Excel with a set of data and claimed that one of the rows that stored the ID’s (unique) had set of duplicates. One way to read this data in an excel is to sort and scroll through to see if you can find any duplicates. The other way that I found out to be cooler and easier for a developer was to get the data as a view in a SQL editor and run queries on top of it. Here are the steps to use the dataset in SQL

1. Get the Raw data that you need to query upon, e.g. 10 rows below for simplicity

7
8
9
12
15
16
17
19
21
23

2. Use your favorite text editor to convert the data to something like this:

SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 12
UNION SELECT 15
UNION SELECT 16
UNION SELECT 17
UNION SELECT 19
UNION SELECT 21

3. Now convert this to a sub-query that can be used in SQL to query the data

SELECT depid FROM
(SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 12
UNION SELECT 15
UNION SELECT 16
UNION SELECT 17
UNION SELECT 19
UNION SELECT 21) deployments(depid)

4. Now in order to get the duplicates, all I had to do was to get a count for all records and get a count for the duplicate records. If that didn’t match, I could have done grouping and got the defaulters

SELECT DISTINCT depid FROM
(SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 12
UNION SELECT 15
UNION SELECT 16
UNION SELECT 17
UNION SELECT 19
UNION SELECT 21) deployments(depid)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s