Linux Shell – Unescape an escaped JSON string

I have a process that gets nightly incremental SQL dumps as INSERT statements from a MySQL database and INSERTS them into a PostgreSQL database. The INSERT statements output from the MySQL Dump is an escaped string (the original string being a JSON). It looks something like this:

INSERT INTO t_usage VALUES (0,'Nu3agpAxIZTJZ3CARVpxZj6Cs5OimYIqaiPVzGJIVZ/0wIEjMN/e/u7CuTKMHptGqEkvboCIvbftG/Reo1TisxknmzvPsKdlAMnHSuTMOWx/a3OPpapCASQsmASOXcpjvl+xlsqAn5280Z7iptP8H1JWf1s5Du93ZlRoE2+XAIQ=','192.168.97.168','{\"payload\": {\"0\": {\"sw_version\": \"kw2016.03.10-ng8\", \"users\": {\"admin_users\": 1, \"users_licensed\": 1, \"locked_users\": 0, \"total_user_profiles\": 4, \"recipient_users\": 0, \"users_need_passwd_reset\": 1, \"total_users\": 1, \"suspended_users\": 0, \"restricted_users\": 0, \"users_need_activation\": 0, \"users_not_licensed\": 0, \"\\\">_users\": 0, \"end_users\": 0, \"standard_users\": 2}, \"license\": {\"search_licensed\": true, \"2fa_licensed\": true, \"expiry\": \"2025-12-25\", \"package_id\": \"kiteworks Enterprise Connect\", \"dlp_licensed\": true, \"signature\": \"xyz=\", \"deployment_id\": \"000000\", \"customer_id\": \"000000\", \"licensed_users\": 500, \"anti_virus_licensed\": true, \"deployment_type\": \"On Premise\"}, \"storage\": {\"users_storage_limit\": \"2097152000\", \"users_storage_used\": \"0\", \"tenant_storage_used\": {}, \"tenant_storage_limit\": \"-1\"}, \"microsoft_reporting\": {\"number_of_users_who_opened_documents\": \"\", \"number_of_opens_office_online\": 0, \"is_explicitly_disabled_office_online\": 0, \"total_users_who_use_office_online_and_mobile\": 0, \"number_of_opens_office_desktop\": 0, \"is_elect_office_online\": false, \"number_of_total_opens\": 0, \"number_of_documents_opened\": \"\"}, \"bandwidth\": {\"downloads\": {}, \"bandwidth_limit\": -1, \"uploads\": {}}, \"service_hostname\": \"pa.dev\", \"active\": 1, \"configuration\": {\"ec_server_count\": 0, \"ec_enabled\": false, \"branding_used\": false, \"migration_status\": \"\", \"dlp_enabled\": false, \"2fa_enabled\": \"\", \"config_max_file_version\": 10, \"ec_source_types\": {\"assigned_sources\": [], \"user_sources\": []}, \"migration_licensed\": [\"Bulk Migration Licensed\", \"Migration Assistant Licensed\"], \"migration_enabled\": \"Migration not enabled\", \"ldap_server_cnt\": 0, \"anti_virus_enabled\": false, \"location_count\": 0, \"total_folders\": 1, \"sys_roles_count\": {\"Web\": 1, \"Search\": 1, \"Anti_virus\": 0, \"Storage\": 1, \"EC\": 0, \"Application\": 1}, \"total_files\": 0, \"search_enabled\": true, \"ldap_enabled\": false, \"sso_enabled\": \"\", \"total_files_numerous_versions\": 0}, \"customer_name\": \"host\"}, \"start_time\": 1470873600, \"end_time\": 1470959999}, \"error\": 0}','2016-08-12 22:30:30',NULL);

So far, I used a sed command on the SQL dump and unescaped the single and double quotes.

sed -i "s/\\\'/\'\'/g" t_usage_incr.sql (Converting \' to a ")
sed -i 's/\\\"/\"/g' t_usage_incr.sql (Converting \" to a ")

But someone tried to do an SQL injection that led to backslashes in the data and my unescaping wouldn’t result in a valid JSON. So the part in the input

\"\\\">_users\"
gets converted into
"\\">_users"

which is still an invalid JSON output. I had missed unescaping the backslash itself. The following sed sequence when used:

sed -i "s/\\\'/\'\'/g" t_usage_incr.sql
sed -i 's/\\\\/\\/g' t_usage_incr.sql
sed -i 's/\\\"/\"/g' t_usage_incr.sql

results in output

"\">_users"

which is eventually a valid JSON because the double quote in the middle is now escaped properly.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s