Run queries like count, distinct, etc from a list of CSV, Excel column on a DB

Someone recently sent me an Excel with a set of data and claimed that one of the rows that stored the ID’s (unique) had set of duplicates. One way to read this data in an excel is to sort and scroll through to see if you can find any duplicates. The other way that I found out to be cooler and easier for a developer was to get the data as a view in a SQL editor and run queries on top of it. Here are the steps to use the dataset in SQL

1. Get the Raw data that you need to query upon, e.g. 10 rows below for simplicity

7
8
9
12
15
16
17
19
21
23

2. Use your favorite text editor to convert the data to something like this:

SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 12
UNION SELECT 15
UNION SELECT 16
UNION SELECT 17
UNION SELECT 19
UNION SELECT 21

3. Now convert this to a sub-query that can be used in SQL to query the data

SELECT depid FROM
(SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 12
UNION SELECT 15
UNION SELECT 16
UNION SELECT 17
UNION SELECT 19
UNION SELECT 21) deployments(depid)

4. Now in order to get the duplicates, all I had to do was to get a count for all records and get a count for the duplicate records. If that didn’t match, I could have done grouping and got the defaulters

SELECT DISTINCT depid FROM
(SELECT 7
UNION SELECT 8
UNION SELECT 9
UNION SELECT 12
UNION SELECT 15
UNION SELECT 16
UNION SELECT 17
UNION SELECT 19
UNION SELECT 21) deployments(depid)
Advertisements

MySQL : Allow remote connections

In order to allow remote SQL connections from any IP address, the following changes need to be made:
1. Locate the my.cnf file using the command

$ locate my.cnf
/etc/opt/mysql/my.cnf

2. Modify the bind-address and change it from localhost to all IP’s

# bind only to localhost
#bind-address = 127.0.0.1
bind-address = 0.0.0.0

3. Disable skip-networking in my.cnf

#skip-networking

4. Grant remote access the root user from any ip (or specify your ip instead of %)

GRANT ALL PRIVILEGES ON *.* TO 'root'@'%'
    IDENTIFIED BY 'your_root_password'
    WITH GRANT OPTION;
FLUSH PRIVILEGES;

5. Restart mysql services

service mysql restart

Linux Shell – Unescape an escaped JSON string

I have a process that gets nightly incremental SQL dumps as INSERT statements from a MySQL database and INSERTS them into a PostgreSQL database. The INSERT statements output from the MySQL Dump is an escaped string (the original string being a JSON). It looks something like this:

INSERT INTO t_usage VALUES (0,'Nu3agpAxIZTJZ3CARVpxZj6Cs5OimYIqaiPVzGJIVZ/0wIEjMN/e/u7CuTKMHptGqEkvboCIvbftG/Reo1TisxknmzvPsKdlAMnHSuTMOWx/a3OPpapCASQsmASOXcpjvl+xlsqAn5280Z7iptP8H1JWf1s5Du93ZlRoE2+XAIQ=','192.168.97.168','{\"payload\": {\"0\": {\"sw_version\": \"kw2016.03.10-ng8\", \"users\": {\"admin_users\": 1, \"users_licensed\": 1, \"locked_users\": 0, \"total_user_profiles\": 4, \"recipient_users\": 0, \"users_need_passwd_reset\": 1, \"total_users\": 1, \"suspended_users\": 0, \"restricted_users\": 0, \"users_need_activation\": 0, \"users_not_licensed\": 0, \"\\\">_users\": 0, \"end_users\": 0, \"standard_users\": 2}, \"license\": {\"search_licensed\": true, \"2fa_licensed\": true, \"expiry\": \"2025-12-25\", \"package_id\": \"kiteworks Enterprise Connect\", \"dlp_licensed\": true, \"signature\": \"xyz=\", \"deployment_id\": \"000000\", \"customer_id\": \"000000\", \"licensed_users\": 500, \"anti_virus_licensed\": true, \"deployment_type\": \"On Premise\"}, \"storage\": {\"users_storage_limit\": \"2097152000\", \"users_storage_used\": \"0\", \"tenant_storage_used\": {}, \"tenant_storage_limit\": \"-1\"}, \"microsoft_reporting\": {\"number_of_users_who_opened_documents\": \"\", \"number_of_opens_office_online\": 0, \"is_explicitly_disabled_office_online\": 0, \"total_users_who_use_office_online_and_mobile\": 0, \"number_of_opens_office_desktop\": 0, \"is_elect_office_online\": false, \"number_of_total_opens\": 0, \"number_of_documents_opened\": \"\"}, \"bandwidth\": {\"downloads\": {}, \"bandwidth_limit\": -1, \"uploads\": {}}, \"service_hostname\": \"pa.dev\", \"active\": 1, \"configuration\": {\"ec_server_count\": 0, \"ec_enabled\": false, \"branding_used\": false, \"migration_status\": \"\", \"dlp_enabled\": false, \"2fa_enabled\": \"\", \"config_max_file_version\": 10, \"ec_source_types\": {\"assigned_sources\": [], \"user_sources\": []}, \"migration_licensed\": [\"Bulk Migration Licensed\", \"Migration Assistant Licensed\"], \"migration_enabled\": \"Migration not enabled\", \"ldap_server_cnt\": 0, \"anti_virus_enabled\": false, \"location_count\": 0, \"total_folders\": 1, \"sys_roles_count\": {\"Web\": 1, \"Search\": 1, \"Anti_virus\": 0, \"Storage\": 1, \"EC\": 0, \"Application\": 1}, \"total_files\": 0, \"search_enabled\": true, \"ldap_enabled\": false, \"sso_enabled\": \"\", \"total_files_numerous_versions\": 0}, \"customer_name\": \"host\"}, \"start_time\": 1470873600, \"end_time\": 1470959999}, \"error\": 0}','2016-08-12 22:30:30',NULL);

So far, I used a sed command on the SQL dump and unescaped the single and double quotes.

sed -i "s/\\\'/\'\'/g" t_usage_incr.sql (Converting \' to a ")
sed -i 's/\\\"/\"/g' t_usage_incr.sql (Converting \" to a ")

But someone tried to do an SQL injection that led to backslashes in the data and my unescaping wouldn’t result in a valid JSON. So the part in the input

\"\\\">_users\"
gets converted into
"\\">_users"

which is still an invalid JSON output. I had missed unescaping the backslash itself. The following sed sequence when used:

sed -i "s/\\\'/\'\'/g" t_usage_incr.sql
sed -i 's/\\\\/\\/g' t_usage_incr.sql
sed -i 's/\\\"/\"/g' t_usage_incr.sql

results in output

"\">_users"

which is eventually a valid JSON because the double quote in the middle is now escaped properly.

Retrieving the last record in each group

This is what I had been looking for a long time now. I have a table where I accumulate analytical data about customers’ usage like storage, bandwidth, licensed users, etc. I get a daily report for each of the server. My goal was to get the most recent data for any customer/server and so I found a way to do that. Thanks to this post. In spite of bookmarking this link, I decided to make notes of how the query for the LEFT JOIN works to get the last set of data for a customer.

Here’s the query I would use to get the latest row for each hostname:

SELECT u.*
FROM usage_data u LEFT JOIN usage_data ut
ON (u.hostname = ut.hostname AND u.id < ut.id)
WHERE ut.id IS NULL
ORDER BY u.hostname

Let’s assume we have the following set of data for a customer/hostname – usage_data(id, hostname, report_date, storage, bandwidth, licensed_users ….).

id | hostname            | report_date | .....
1  | customer.domain.com | 03/15/2015  | .....
2  | customer.domain.com | 03/16/2015  | .....
3  | customer.domain.com | 03/17/2015  | .....

When I do a simple left join, ON (u.hostname = ut.hostname), I’ll see resultant data with following id mappings:
1-3
1-2
1-1
2-3
2-2
2-1
3-3
3-2
3-1

But when I add another condition, the join looks like ON (u.hostname = ut.hostname AND u.id < ut.id), I’m only mapping rows for which u.id < ut.id. Therefore, you get the following mappings for id in [1, 2]
1-3
1-2
2-3
Now, for the id=3 in table u, there isn’t a row in ut that satisfies the condition u.id < ut.id. But since we’re using a LEFT JOIN, the database engine ensures that there is an entry for each row on the LEFT table. Therefore, the engine returns row for id=3 in table u and replaces by NULL for each value in table ut so our result set looks like the following
1-3
1-2
2-3
3-NULL

That’s what we wanted. A way to identify a row uniquely that is the latest. Now if we add another WHERE clause in the result set, to get only that data that has ut.id as NULL, we will get the row corresponding to the latest id value in table u. Therefore, the LEFT JOIN query makes a little more sense to me now and hopefully to you as well.

Execute MySQL Query from linux prompt (not db prompt)

Here’s a clean way to execute MySQL query without evening getting into the DB. Saves a step. But what is important here is that the password for the DB is generated via a script which can be used to connect to the DB.

mysql -uroot -p`python -c "from common_utils import common_lib; print common_lib.mysql_passwd()"` -D dbName -e "select tenant_hostname from tenant_ids"

Similarly, executing a DB backup by getting the password from a script result can be done as follows:

mysqldump -hlocalhost -uroot -p`python -c "from common_utils import common_lib; print common_lib.mysql_passwd()"` dbName tableName --result-file="/tmp/results.sql"