下午好 我们是Rostelecom数据管理部门之一的系统分析师团队。 我们公司拥有300多种异构数据源-各种各样的数据源对于支持Rostelecom在所有众多领域的工作都是必不可少的。 我们研究数据源,并在必要时将其部分上传到存储循环。
表面上的好奇心流浪者。 它还具有许多异构数据源。 图片取自therahnuma.com。在此过程中,区分了两个子任务:定义一种策略,用于根据源表的属性从源表收集数据,并准备作为数据仓库“接收方”的表。 为此,我们使用各种GUI和反向工程工具。 另外,在收集信息时,系统分析师开始获取对DBMS(主要是Oracle)的信息表的辅助查询池。 在本文中,我将分享我们团队使用的此类脚本的“绅士组合”。
首先,简要列出所有脚本:
- 许多脚本使用xmlagg来聚合字符串,因为listagg无法处理串联导致的太长的字符串。
- 在除“过程,函数和包”之外的所有脚本中,目标表都是通过“ with”块中的过滤器表指定的。 填写方案名称和表名称。
- 每个脚本都附带一个或多个用例,规范说明(结果集)以及已用系统表的列表(以评估在特定数据库上使用的可能性)。
脚本“关于表的信息”
规格 :
使用的系统表: all_tab_columns,all_tab_comments,all_tab_statistics,all_part_key_columns,all_subpart_key_columns。
该查询对于确定从源系统上载数据的策略很有用。 如果主键建立在所考虑的表上,则可以组织卸载并在其上随后分配“增量”。 如果有时间戳记(例如,在技术领域中提供有关插入数据或更新的信息),则可以组织一段时间内仅上载已更改/添加的记录的上载。 创建类似的表“接收器”时,有关分区结构的信息会派上用场。
要求正文:with filter (owner, table_name) as ( select 'SCHEMA_NAME_1', t.* from table( sys.odcivarchar2list( 'TABLE_NAME_1' , 'TABLE_NAME_2' ) ) t union all select owner , table_name from all_tables where owner = 'SCHEMA_NAME_2' ) select a.owner as schema_name , a.table_name , e.comments , b.height , c.width , d.datetime_columns , b.avg_row_len , p.part_key , s.subpart_key from filter a left join ( select owner , table_name , num_rows as height , avg_row_len from all_tab_statistics where object_type = 'TABLE' ) b on a.table_name = b.table_name and a.owner = b.owner left join ( select owner , table_name , count(1) as width from all_tab_columns group by owner , table_name ) c on a.table_name = c.table_name and a.owner = c.owner left join ( select owner , table_name , listagg( column_name || ' (' || data_type || ')' , ', ' ) within group (order by column_id) as datetime_columns from all_tab_columns where data_type = 'DATE' or data_type like 'TIMESTAMP%' or data_type like 'INTERVAL%' or lower(column_name) like '%period%' or lower(column_name) like '%date%' or lower(column_name) like '%time%' group by owner , table_name ) d on a.table_name = d.table_name and a.owner = d.owner left join ( select owner , table_name , comments from all_tab_comments where table_type = 'TABLE' ) e on a.table_name = e.table_name and a.owner = e.owner left join ( select owner , name as table_name , listagg( column_name , ', ' ) within group (order by column_position) as part_key from all_part_key_columns where object_type = 'TABLE' group by owner , name ) p on a.owner = p.owner and a.table_name = p.table_name left join ( select owner , name as table_name , listagg( column_name , ', ' ) within group (order by column_position) as subpart_key from all_subpart_key_columns where object_type = 'TABLE' group by owner , name ) s on a.owner = s.owner and a.table_name = s.table_name order by e.owner , e.table_name ;
脚本“分区和子分区”
规格:
使用的系统表: all_tab_partitions,all_tab_subpartitions,all_part_key_columns,all_subpart_key_columns。
当分区直接用作数据源时,该脚本对于获取分区的特征(名称,大小)很有用。
要求正文: with filter (owner, table_name) as ( select 'SCHEMA_NAME_1', t.* from table( sys.odcivarchar2list( 'TABLE_NAME_1' , 'TABLE_NAME_2' ) ) t union all select owner , table_name from all_tables where owner = 'SCHEMA_NAME_2' ) select f.owner as schema_name , f.table_name , p.part_key , pc.partition_name , pc.partition_position , pc.num_rows as partition_height , s.subpart_key , sc.subpartition_name , sc.subpartition_position , sc.num_rows as subpartition_height from filter f join ( select owner , name as table_name , listagg( column_name , ', ' ) within group (order by column_position) as part_key from all_part_key_columns where object_type = 'TABLE' group by owner , name ) p on f.owner = p.owner and f.table_name = p.table_name left join all_tab_partitions pc on p.table_name = pc.table_name and p.owner = pc.table_owner left join ( select owner , name as table_name , listagg( column_name , ', ' ) within group (order by column_position) as subpart_key from all_subpart_key_columns where object_type = 'TABLE' group by owner , name ) s on p.owner = s.owner and p.table_name = s.table_name left join all_tab_subpartitions sc on f.owner = sc.table_owner and f.table_name = sc.table_name and pc.partition_name = sc.partition_name order by f.owner , f.table_name ;
脚本“表的属性组成”
规格:
使用的系统表: all_tables,all_constraints,all_cons_columns,all_tab_columns,all_col_comments,v $ nls_parameters。
当需要有关表,表与其他表的关系以及完整的属性组成的详细信息时,该脚本对于在数据仓库中准备“接收器”表很有用。 filter2表用于过滤要搜索链接(从和到)的表。 默认情况下,表是从除系统方案以外的所有方案中获取的。
要求正文: with filter (owner, table_name) as ( select 'SCHEMA_NAME_1', t.* from table( sys.odcivarchar2list( 'TABLE_NAME_1' , 'TABLE_NAME_2' ) ) t union all select owner , table_name from all_tables where owner = 'SCHEMA_NAME_2' ) , filter2 (owner, table_name) as ( select owner, table_name from all_tables where owner not in ('MDSYS', 'CTXSYS', 'SYSTEM', 'XDB', 'SYS') ) , refs as ( select b.constraint_type as from_constraint_type , b.constraint_name as from_constraint_name , d.position as from_position , d.column_name as from_column_name , b.table_name as from_table_name , b.owner as from_owner , a.owner as to_owner , a.table_name as to_table_name , c.column_name as to_column_name , c.position as to_position , a.constraint_name as to_constraint_name , a.constraint_type as to_constraint_type from all_constraints a left join all_constraints b on a.r_constraint_name = b.constraint_name and a.r_owner = b.owner left join all_cons_columns c on a.constraint_name = c.constraint_name and a.table_name = c.table_name and a.owner = c.owner left join all_cons_columns d on b.constraint_name = d.constraint_name and b.table_name = d.table_name and b.owner = d.owner where a.constraint_type = 'R' and b.constraint_type in ('P', 'U') and c.position = d.position ) , depends as ( select rtrim( xmlagg( xmlelement( e , to_owner || '.' || to_table_name || '.' || to_column_name , ', ' ).extract('//text()') order by to_owner ).getclobval() , ', ' ) as val , from_owner as owner , from_table_name as table_name , from_column_name as column_name from refs where (to_owner, to_table_name) in (select * from filter2) group by from_table_name , from_column_name , from_owner ) , impacts as ( select rtrim( xmlagg( xmlelement( e , from_owner || '.' || from_table_name || '.' || from_column_name , ', ' ).extract('//text()') order by from_owner ).getclobval() , ', ' ) as val , to_owner as owner , to_table_name as table_name , to_column_name as column_name from refs where (from_owner, from_table_name) in (select * from filter2) group by to_table_name , to_column_name , to_owner ) select f.owner as schema_name , f.table_name , a.column_id , a.column_name , a.data_type , b.comments as column_comment , decode ( a.data_type , 'NUMBER', nvl(a.data_scale, 0) , '' ) as scale , decode ( a.data_type , 'NUMBER', nvl(a.data_precision, 38) , '' ) as precision , a.data_length as byte_length , case when a.data_type in ('CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2') then d.value end as encoding , case when a.data_type in ('CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2') then a.char_length
脚本“程序,函数和包”
规格:二手系统表: all_source
在分析源时,可能会出现一项任务,以研究系统中循环的数据流。 没有软件包,功能和过程的代码库,几乎是不可能的,尤其是在文档不完整或缺少的情况下。 为方便起见,通过脚本列出的对象可以表的形式表示。 可以使用控制台实用程序将请求的结果输出到流中,并通过简单的处理程序(bash脚本)将其重定向到文件,以供您喜欢的编辑器进一步研究。 此外,各种处理程序可以“挂”在输出流上-美化,展开等。
要求正文: select t.owner as schema_name , t.name as name , xmlagg( xmlelement( e , t.text , '' ).extract('//text()') order by t.line asc ).getclobval() as body , f.wrapped , t.type as type from ( select owner, name, type , case when lower(text) like '%wrapped%' then 1 else 0 end as wrapped from all_source where type in ( 'PACKAGE BODY' , 'PACKAGE' , 'FUNCTION' , 'PROCEDURE' ) and line = 1 and owner not in ('MDSYS', 'CTXSYS', 'SYSTEM', 'XDB', 'SYS') ) f join all_source t on f.owner = t.owner and f.name = t.name and f.type = t.type group by t.owner , t.name , t.type , f.wrapped order by t.owner , t.name , t.type ;
结论
上面描述的脚本帮助我们的系统分析员摆脱了收集数据库信息的许多常规任务,并专注于更具创造性的事物,例如加载策略和“接收器”表的结构。 我希望这些脚本也对您有用。 知道如何自动化这些任务和类似任务将很有趣。