Home Ensuring Data Type Consistency with SQL
Post
Cancel

Ensuring Data Type Consistency with SQL

Introduction

Data type consistency across databases is crucial, especially for tasks like data migrations. Based on my experience in managing data migrations, I have developed a few SQL scripts that streamline the process of comparing and converting data types between database tables. These scripts are particularly valuable when working with large tables, often with more than 100 columns, where manual data type adjustments are not practical and are prone to errors.

Code

Script Overview

This repository includes two main scripts to ensure data type consistency and facilitate data migrations.

Compare Table Column Data Types (compare_table_column_data_types.sql)

This script compares the data types of columns in two tables within a database. By fetching column metadata from the INFORMATION_SCHEMA.COLUMNS table, it checks if the data types in the source table match those in the target table. It is invaluable for identifying discrepancies in the data types to ensure consistency before initiating data migrations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
select
    src.column_name as source_column,
    src.data_type as source_type,
    dst.column_name as target_column,
    dst.data_type as target_type,
    case
        when dst.data_type = src.data_type then true
        else false
    end as is_data_type_same
from
    (
        select
            ordinal_position,
            column_name,
            data_type
        from
            information_schema.columns
        where
            table_schema = 'test'
            and table_name = 'source_tbl'
    ) src
    join (
        select
            ordinal_position,
            column_name,
            data_type
        from
            information_schema.columns
        where
            table_schema = 'test'
            and table_name = 'target_tbl'
    ) dst on src.column_name = dst.column_name
order by
    src.ordinal_position asc;

Example Output Compare Table Column Data Types Example Output

Generate Data Type Casting SQL (generate_data_type_casting.sql)

When discrepancies are found, casting data correctly is essential. This script generates the necessary SQL expressions to cast data types from one table to match those in another. This functionality ensures that data fits into the target table and facilitates migrations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
select
    src.column_name as src_column,
    src.data_type as src_type,
    dst.column_name as dst_column,
    dst.data_type as dst_type,
    case
        when dst.data_type = 'int' and src.data_type <> 'int' then concat('cast(nullif(', src.column_name, ',\'\') as unsigned) as ', src.column_name, ',')
        when dst.data_type = 'decimal' and src.data_type <> 'decimal' then concat('cast(nullif(', src.column_name, ',\'\') as decimal) as ', src.column_name, ',')
        when dst.data_type = 'varchar' and src.data_type <> 'varchar' then concat('cast(', src.column_name, ' as char) as ', src.column_name, ',')
        when dst.data_type = 'date' and src.data_type <> 'date' then concat('str_to_date(nullif(', src.column_name, ',\'\'), \'%Y-%m-%d\') as ', src.column_name, ',')
        else concat(src.column_name, ',')
    end as `sql`
from
    (select column_name, ordinal_position, data_type
     from INFORMATION_SCHEMA.COLUMNS
     where table_schema = 'test'and
		table_name = 'source_tbl'
    ) src
join
    (select column_name, ordinal_position, data_type
     from INFORMATION_SCHEMA.COLUMNS
     where table_schema = 'test' and
		table_name = 'target_tbl'
    ) dst
on src.column_name = dst.column_name
order by dst.ordinal_position asc;

Example Output Generate Data Type Casting SQL Example Output

You can take the generated expressions and create a SQL statement to copy data from the source table into the target table, as shown in an example below:

1
2
3
4
5
6
7
8
9
10
11
12
13
insert into test.target_tbl
select
	id,
	name,
	age,
	email,
	cast(nullif(salary,'') as decimal) as salary,
	cast(nullif(quantity,'') as unsigned) as quantity,
	cast(is_active as char) as is_active,
	cast(nullif(total_hours,'') as unsigned) as total_hours,
	str_to_date(nullif(last_login,''), '%Y-%m-%d') as last_login,
	created_at
from test.source_tbl;

Discussion

I needed these scripts to handle data migration issues that I came across at work. Errors often occurred when data types between the source and target tables were not aligned. By automating the data comparison and type conversion processes, these scripts reduce manual efforts, which is especially beneficial for tables with extensive columns.

While the scripts shown here are used with small test tables, their strength is more apparent in larger applications. Anyone facing similar challenges can adapt the scripts to their environments to ensure data type consistency and support data migrations.

This post is licensed under CC BY 4.0 by the author.
Contents