Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MS SQL: multiple excluding table names like 'TABLE' in schema 'SCHEMA' not parsing any data #1328

Open
5 of 7 tasks
troley opened this issue Oct 15, 2021 · 0 comments · May be fixed by #1329
Open
5 of 7 tasks

MS SQL: multiple excluding table names like 'TABLE' in schema 'SCHEMA' not parsing any data #1328

troley opened this issue Oct 15, 2021 · 0 comments · May be fixed by #1329

Comments

@troley
Copy link

troley commented Oct 15, 2021

The docs state that the excluding table names matching command with comma separated table names should be used to exclude tables, but the matching keyword doesn't work at all for MS SQL (also mentioned in #1167). An error is produced indicating that the matching keyword is not recognized.

The excluding table names like command does work, but only for one table - schema pair. When multiple excluding table names like commands are provided, then no data will be parsed, but the script will run successfully.

So, the following example works:

excluding table names like 'some_table' in schema 'dbo'

The following example doesn't:

excluding table names like 'some_table' in schema 'dbo'
excluding table names like 'some_table' in schema 'dbo2'

Worth mentioning is that the including only table names like command is repeatable. So the following 2 examples both work correctly:

including only table names like 'some_table' in schema 'dbo'
including only table names like 'some_table' in schema 'dbo'
including only table names like 'some_table' in schema 'dbo2'

I would expect that there is a way that makes it possible to exclude multiple table - schema pairs for MS SQL, but as far as I know/tried this doesn't seem to be possible.


  • pgloader --version

    pgloader version "3.6.3047c9a"
    compiled with SBCL 2.0.1.debian
    
  • did you test a fresh compile from the source tree?

  • did you search for other similar issues?

  • how can I reproduce the bug?

    I don't have a remotely hosted MS SQL server to make this reproducible, sorry. I might add the create/insert setup queries later.

LOAD database
    FROM mssql://exampleuser:examplepass@localhost:1433/database_oct_2021
    INTO postgresql://postgres:password@localhost:5432/database_oct_2021
 
    WITH data only
 
    excluding table names like 'some_table' in schema 'dbo'
    excluding table names like 'some_table' in schema 'schema1'
    excluding table names like 'some_table' in schema 'schema2'
 
    alter schema 'dbo' rename to 'public'
 
    SET work_mem to '16MB', maintenance_work_mem to '512MB'
 
    BEFORE LOAD DO
        $$ SET TIME ZONE UTC; $$;
  • pgloader output you obtain
pgloader version 3.6.3047c9a
compiled with SBCL 2.0.1.debian
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2021-10-15T08:36:14.012000+02:00 NOTICE Starting pgloader, log system is ready.
2021-10-15T08:36:14.026000+02:00 INFO Starting monitor
2021-10-15T08:36:14.030000+02:00 LOG pgloader version "3.6.3047c9a"
2021-10-15T08:36:14.080000+02:00 INFO Parsed command:
LOAD database
    FROM mssql://exampleuser:examplepass@localhost:1433/database_oct_2021
    INTO postgresql://postgres:password@localhost:5432/database_oct_2021

    WITH data only

    excluding table names like 'some_table' in schema 'dbo'
    excluding table names like 'some_table' in schema 'schema1'
    excluding table names like 'some_table' in schema 'schema2'

    alter schema 'dbo' rename to 'public'

    SET work_mem to '16MB', maintenance_work_mem to '512MB'

    BEFORE LOAD DO
        $$ SET TIME ZONE UTC; $$;

2021-10-15T08:36:14.229000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {1007B6D703}>
2021-10-15T08:36:14.230000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:14.231000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:14.231000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:14.232000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:14.612000+02:00 NOTICE Executing SQL block for before load
2021-10-15T08:36:14.702000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {1007B6D703}>
2021-10-15T08:36:14.704000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:14.704000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:14.705000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:14.705000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:14.710000+02:00 DEBUG BEGIN
2021-10-15T08:36:14.711000+02:00 SQL SET TIME ZONE UTC;
2021-10-15T08:36:14.720000+02:00 LOG Migrating from #<MSSQL-CONNECTION mssql://exampleuser@localhost:1433/database_oct_2021 {1007B6BA03}>
2021-10-15T08:36:14.721000+02:00 LOG Migrating into #<PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {1007B6D703}>
Max connections reached, increase value of TDS_MAX_CONN
2021-10-15T08:36:15.007000+02:00 SQL MSSQL: sending query: -- params: dbname
--         table-type-name
--         including
--         filter-list-to-where-clause including
--         excluding
--         filter-list-to-where-clause excluding
  select c.TABLE_SCHEMA,
         c.TABLE_NAME,
         c.COLUMN_NAME,
         c.DATA_TYPE,
         CASE
         WHEN c.COLUMN_DEFAULT LIKE '((%' AND c.COLUMN_DEFAULT LIKE '%))' THEN
             CASE
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) = 'newid()' THEN 'GENERATE_UUID'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) LIKE 'convert(%varchar%,getdate(),%)' THEN 'today'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) = 'getdate()' THEN 'CURRENT_TIMESTAMP'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) LIKE '''%''' THEN SUBSTRING(c.COLUMN_DEFAULT,4,len(c.COLUMN_DEFAULT)-6)
                 ELSE SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4)
             END
         WHEN c.COLUMN_DEFAULT LIKE '(%' AND c.COLUMN_DEFAULT LIKE '%)' THEN
             CASE
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) = 'newid()' THEN 'GENERATE_UUID'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) LIKE 'convert(%varchar%,getdate(),%)' THEN 'today'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) = 'getdate()' THEN 'CURRENT_TIMESTAMP'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) LIKE '''%''' THEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4)
                 ELSE SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2)
             END
         ELSE c.COLUMN_DEFAULT
         END,
         c.IS_NULLABLE,
         COLUMNPROPERTY(object_id(c.TABLE_NAME), c.COLUMN_NAME, 'IsIdentity'),
         c.CHARACTER_MAXIMUM_LENGTH,
         c.NUMERIC_PRECISION,
         c.NUMERIC_PRECISION_RADIX,
         c.NUMERIC_SCALE,
         c.DATETIME_PRECISION,
         c.CHARACTER_SET_NAME,
         c.COLLATION_NAME

    from INFORMATION_SCHEMA.COLUMNS c
         join INFORMATION_SCHEMA.TABLES t
              on c.TABLE_SCHEMA = t.TABLE_SCHEMA
             and c.TABLE_NAME = t.TABLE_NAME

   where     c.TABLE_CATALOG = 'database_oct_2021'
         and t.TABLE_TYPE = 'BASE TABLE'

         and ((c.table_schema = 'dbo' and c.table_name NOT LIKE 'some_table')
           and (c.table_schema = 'schema1' and c.table_name NOT LIKE 'some_table')
           and (c.table_schema = 'schema2' and c.table_name NOT LIKE 'some_table'))

order by c.table_schema, c.table_name, c.ordinal_position;
2021-10-15T08:36:15.029000+02:00 SQL MSSQL: sending query: -- params: dbname
--         including
--         filter-list-to-where-clause including
--         excluding
--         filter-list-to-where-clause excluding
   SELECT
           REPLACE(KCU1.CONSTRAINT_NAME, '.', '_') AS 'CONSTRAINT_NAME'
         , KCU1.TABLE_SCHEMA AS 'TABLE_SCHEMA'
         , KCU1.TABLE_NAME AS 'TABLE_NAME'
         , KCU1.COLUMN_NAME AS 'COLUMN_NAME'
         , KCU2.TABLE_SCHEMA AS 'UNIQUE_TABLE_SCHEMA'
         , KCU2.TABLE_NAME AS 'UNIQUE_TABLE_NAME'
         , KCU2.COLUMN_NAME AS 'UNIQUE_COLUMN_NAME'
         , RC.UPDATE_RULE AS 'UPDATE_RULE'
         , RC.DELETE_RULE AS 'DELETE_RULE'

    FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
         JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
              ON KCU1.CONSTRAINT_CATALOG = RC.CONSTRAINT_CATALOG
                 AND KCU1.CONSTRAINT_SCHEMA = RC.CONSTRAINT_SCHEMA
                 AND KCU1.CONSTRAINT_NAME = RC.CONSTRAINT_NAME
         JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU2
              ON KCU2.CONSTRAINT_CATALOG = RC.UNIQUE_CONSTRAINT_CATALOG
                 AND KCU2.CONSTRAINT_SCHEMA = RC.UNIQUE_CONSTRAINT_SCHEMA
                 AND KCU2.CONSTRAINT_NAME = RC.UNIQUE_CONSTRAINT_NAME

   WHERE KCU1.ORDINAL_POSITION = KCU2.ORDINAL_POSITION
         AND KCU1.TABLE_CATALOG = 'database_oct_2021'
         AND KCU1.CONSTRAINT_CATALOG = 'database_oct_2021'
         AND KCU1.CONSTRAINT_SCHEMA NOT IN ('dto', 'sys')
         AND KCU1.TABLE_SCHEMA NOT IN ('dto', 'sys')
         AND KCU2.TABLE_SCHEMA NOT IN ('dto', 'sys')


         and ((kcu1.table_schema = 'dbo' and kcu1.table_name NOT LIKE 'some_table') and (kcu1.table_schema = 'schema1' and kcu1.table_name NOT LIKE 'some_table') and (kcu1.table_schema = 'schema2' and kcu1.table_name NOT LIKE 'some_table'))

ORDER BY KCU1.CONSTRAINT_NAME, KCU1.ORDINAL_POSITION;
2021-10-15T08:36:15.070000+02:00 INFO Processing source catalogs
2021-10-15T08:36:15.094000+02:00 NOTICE Prepare PostgreSQL database.
2021-10-15T08:36:15.191000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {1007B6D703}>
2021-10-15T08:36:15.192000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:15.193000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:15.193000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:15.193000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:15.194000+02:00 DEBUG BEGIN
2021-10-15T08:36:15.232000+02:00 SQL --
-- get user defined SQL types
--
  select nt.nspname,
         extname,
         typname,
         case when enum.enumtypid is not null
              then array_agg(enum.enumlabel order by enumsortorder)
          end as enumvalues

    from pg_class c
         join pg_namespace n on n.oid = c.relnamespace
         left join pg_attribute a on c.oid = a.attrelid and a.attnum > 0
         join pg_type t on t.oid = a.atttypid
         left join pg_namespace nt on nt.oid = t.typnamespace
         left join pg_depend d on d.classid = 'pg_type'::regclass
                              and d.refclassid = 'pg_extension'::regclass
                              and d.objid = t.oid
         left join pg_extension e on refobjid = e.oid
         left join pg_enum enum on enum.enumtypid = t.oid

   where nt.nspname !~ '^pg_' and nt.nspname <> 'information_schema'
         and n.nspname !~ '^pg_' and n.nspname <> 'information_schema'
         and c.relkind in ('r', 'f', 'p')


         and
           (   t.typrelid = 0
            or
               (select c.relkind = 'c'
                 from pg_class c
                where c.oid = t.typrelid)
           )
           and not exists
             (
                select 1
                  from pg_type el
                 where el.oid = t.typelem
                   and el.typarray = t.oid
              )

group by nt.nspname, extname, typname, enumtypid
order by nt.nspname, extname, typname, enumtypid;
2021-10-15T08:36:15.288000+02:00 SQL -- params: table-type-name
--         including
--         filter-list-to-where-clause for including
--         excluding
--         filter-list-to-where-clause for excluding
with seqattr as
 (
   select adrelid,
          adnum,
          pg_get_expr(d.adbin, d.adrelid) as adsrc,
          case when pg_get_expr(d.adbin, d.adrelid) ~ 'nextval'
               then substring(pg_get_expr(d.adbin, d.adrelid)
                              from '''([^'']+)'''
                    )
               else null
           end as seqname
     from pg_attrdef d
 )
    select nspname, relname, c.oid, attname,
           t.oid::regtype as type,
           case when atttypmod > 0
                then substring(format_type(t.oid, atttypmod) from '\d+(?:,\d+)?')
                else null
            end as typmod,
           attnotnull,
           case when atthasdef
                then pg_get_expr(def.adbin, def.adrelid)
            end as default           ,
           case when s.seqname is not null then 'auto_increment' end as extra
      from pg_class c
           join pg_namespace n on n.oid = c.relnamespace
           left join pg_attribute a on c.oid = a.attrelid
           join pg_type t on t.oid = a.atttypid and attnum > 0
           left join pg_attrdef def on a.attrelid = def.adrelid
                                   and a.attnum = def.adnum
                                   and a.atthasdef
           left join seqattr s on def.adrelid = s.adrelid
                              and def.adnum = s.adnum

     where nspname !~ '^pg_' and n.nspname <> 'information_schema'
           and relkind in ('r', 'f', 'p')



  order by nspname, relname, attnum;
2021-10-15T08:36:15.322000+02:00 SQL -- params: including
--         filter-list-to-where-clause for including
--         excluding
--         filter-list-to-where-clause for excluding
  select n.nspname,
         i.relname,
         i.oid,
         rn.nspname,
         r.relname,
         indisprimary,
         indisunique,
         (select string_agg(attname, ',')
            from pg_attribute
           where attrelid = r.oid
             and array[attnum::integer] <@ indkey::integer[]
         ) as cols,
         pg_get_indexdef(indexrelid),
         c.conname,
         pg_get_constraintdef(c.oid)
    from pg_index x
         join pg_class i ON i.oid = x.indexrelid
         join pg_class r ON r.oid = x.indrelid
         join pg_namespace n ON n.oid = i.relnamespace
         join pg_namespace rn ON rn.oid = r.relnamespace
         left join pg_depend d on d.classid = 'pg_class'::regclass
                              and d.objid = i.oid
                              and d.refclassid = 'pg_constraint'::regclass
                              and d.deptype = 'i'
         left join pg_constraint c ON c.oid = d.refobjid
   where n.nspname !~ '^pg_' and n.nspname <> 'information_schema'


order by n.nspname, r.relname
2021-10-15T08:36:15.360000+02:00 SQL -- params: including (table)
--         filter-list-to-where-clause for including
--         excluding (table)
--         filter-list-to-where-clause for excluding
--         including (ftable)
--         filter-list-to-where-clause for including
--         excluding (ftable)
--         filter-list-to-where-clause for excluding
 select n.nspname, c.relname, nf.nspname, cf.relname as frelname,
        r.oid,
        d.refobjid as pkeyoid,
        conname,
        pg_catalog.pg_get_constraintdef(r.oid, true) as condef,
        (select string_agg(attname, ',')
           from pg_attribute
          where attrelid = r.conrelid
            and array[attnum::integer] <@ conkey::integer[]
        ) as conkey,
        (select string_agg(attname, ',')
           from pg_attribute
          where attrelid = r.confrelid
            and array[attnum::integer] <@ confkey::integer[]
        ) as confkey,
        confupdtype, confdeltype, confmatchtype,
        condeferrable, condeferred
   from pg_catalog.pg_constraint r
        JOIN pg_class c on r.conrelid = c.oid
        JOIN pg_namespace n on c.relnamespace = n.oid
        JOIN pg_class cf on r.confrelid = cf.oid
        JOIN pg_namespace nf on cf.relnamespace = nf.oid
        JOIN pg_depend d on d.classid = 'pg_constraint'::regclass
                        and d.refclassid = 'pg_class'::regclass
                        and d.objid = r.oid
                        and d.refobjsubid = 0
   where r.contype = 'f'
         AND c.relkind in ('r', 'f', 'p')
         AND cf.relkind in ('r', 'f', 'p')
         AND n.nspname !~ '^pg_' and n.nspname <> 'information_schema'
         AND nf.nspname !~ '^pg_' and nf.nspname <> 'information_schema'




2021-10-15T08:36:15.387000+02:00 SQL -- params pkey-oid-list
--        fkey-oild-list
with pkeys(oid) as (
  values(17438),(17436),(17449),(17426),(17419),(17457),(17468),(17518),(17520),(17507),(17515),(17479),(17492),(17490),(17383),(17381),(17402),(17517),(17409),(17391),(17535),(17537),(17549),(17547),(17321),(17323),(17333),(17335),(17191),(17193),(17310),(17308),(17344),(17342),(17367),(17365)
),
     knownfkeys(oid) as (
  values(17610),(17615),(17620),(17630),(17625),(17635)
),
  pkdeps as (
  select pkeys.oid, pg_depend.objid
    from pg_depend
         join pkeys on pg_depend.refobjid = pkeys.oid
   where     classid = 'pg_catalog.pg_constraint'::regclass
         and refclassid = 'pg_catalog.pg_class'::regclass
)
 select n.nspname, c.relname, nf.nspname, cf.relname as frelname,
        r.oid as conoid, conname,
        pg_catalog.pg_get_constraintdef(r.oid, true) as condef,
        pkdeps.oid as index_oid
   from pg_catalog.pg_constraint r
        JOIN pkdeps on r.oid = pkdeps.objid
        JOIN pg_class c on r.conrelid = c.oid
        JOIN pg_namespace n on c.relnamespace = n.oid
        JOIN pg_class cf on r.confrelid = cf.oid
        JOIN pg_namespace nf on cf.relnamespace = nf.oid
  where NOT EXISTS (select 1 from knownfkeys where oid = r.oid)
2021-10-15T08:36:15.391000+02:00 DEBUG fetch-pgsql-catalog: 21 tables, 36 indexes, 6+6 fkeys
2021-10-15T08:36:15.391000+02:00 INFO Done with COPYing data, waiting for indexes
2021-10-15T08:36:15.405000+02:00 NOTICE Completing PostgreSQL database.
2021-10-15T08:36:15.421000+02:00 NOTICE Reset sequences
2021-10-15T08:36:15.507000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {100C450043}>
2021-10-15T08:36:15.508000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:15.509000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:15.510000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:15.510000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:15.510000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:15.511000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:15.511000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:15.512000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:15.652000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {100CB25503}>
2021-10-15T08:36:15.653000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:15.654000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:15.654000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:15.654000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:15.656000+02:00 DEBUG BEGIN
2021-10-15T08:36:15.667000+02:00 LOG report summary reset
             table name     errors       read   imported      bytes      total time       read      write
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
            before load          0          1          1                     0.102s
        fetch meta data          0          0          0                     0.343s
      Drop Foreign Keys          0          0          0                     0.000s
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
COPY Threads Completion          0          4          4                     0.000s
        Reset Sequences          0         13         13                     0.151s
    Create Foreign Keys          0          0          0                     0.000s
       Install Comments          0          0          0                     0.000s
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
      Total import time          ✓          0          0                     0.151s
2021-10-15T08:36:15.731000+02:00 INFO Stopping monitor
  • data that is being loaded, if relevant

No data, that is the issue. I would expect data to be parsed from a total of 18 tables in my case. 3 tables, each from a different schema should be excluded.

  • How the data is different from what you expected, if relevant
troley added a commit to semestry/pgloader that referenced this issue Oct 15, 2021
Allows a pgloader load file to parse data correctly when multiple
`excluding table names like 'TABLE' in schema 'SCHEMA'`
filter rules have been provided.

Fixes dimitri#1328
troley added a commit to semestry/pgloader that referenced this issue Oct 15, 2021
Allows a pgloader load file to parse data correctly when multiple
`excluding table names like 'TABLE' in schema 'SCHEMA'`
filter rules have been provided.

Fixes dimitri#1328
marcokrikke pushed a commit to semestry/pgloader that referenced this issue Nov 30, 2023
Allows a pgloader load file to parse data correctly when multiple
`excluding table names like 'TABLE' in schema 'SCHEMA'`
filter rules have been provided.

Fixes dimitri#1328
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant