缩短时间范围(加密货币,外汇,交易所)

前段时间,我的任务是编写一个程序来执行外汇市场报价(更确切地说是时间范围数据)的稀疏操作。

问题的陈述:以这种格式每隔1秒输入一次数据:

  • 工具名称(USDEUR对代码等),
  • UNIX时间格式的日期和时间,
  • 开盘价值(区间中第一笔交易的价格),
  • 高价值(最高价格),
  • 低价值
  • 收盘价(最后一笔交易的价格),
  • 交易量(交易量或交易量)。

必须确保表中数据的重新计算和同步:5秒,15秒,1分钟,5分钟,15分钟等。

所描述的数据存储格式称为OHLC或OHLCV(打开,高,低,关闭,音量)。 它经常使用,您可以立即在其上构建“日本蜡烛”图表。

图片

在削减的基础上,我描述了我可以想到的所有选项,如何稀疏(放大)接收到的数据,以进行分析,例如,比特币价格的冬季上涨,然后根据接收到的数据,您将立即构建一个“ Japanese Candles”图表(在MS Excel中也有这样的图表) ) 在上图中,此图表是针对“ bitstampUSD”工具的“ 1个月”时间框架构建的。 蜡烛的白色主体表示该时间间隔内的价格上涨,黑色表示价格下降,上下灯芯指示该时间间隔内达到的最高和最低价格。 背景-交易量。 可以清楚地看到,2017年12月,价格接近2万马克。

将为两个数据库引擎(Oracle和MS SQL)提供解决方案,这将以某种方式使它们可以针对特定任务进行比较(我们不会将比较结果推广到其他任务)。

然后,我以一种简单的方式解决了这个问题:在临时表中计算正确的细化并与目标表同步-删除目标表中存在但临时表中不存在的行,并添加临时表中存在但目标中不存在的行。 那时,客户满意了解决方案,于是我完成了任务。

但是现在我决定考虑所有选项,因为上述解决方案包含一个功能-很难一次针对两种情况进行优化:

  • 当目标表为空并且您需要添加大量数据时,
  • 当目标表很大时,您需要以小块添加数据。

这是由于以下事实:在该过程中,您必须连接目标表和临时表,并且需要附加到较大的表上,而不是相反。 在上述两种情况下,大/小互换。 优化器将基于统计信息确定连接顺序,并且统计信息可能已过时,并且决策可能不正确,这将导致严重的性能下降。

在本文中,我将介绍一次性稀疏方法,这些方法可能对读者进行分析很有用,例如,比特币价格的冬天上涨。

可以从github文章底部的链接下载在线稀疏过程。

到点...我的任务是从“ 1秒”时间间隔到下一个时间间隔,但是在这里,我正在考虑从事务级别(在源表中,字段STOCK_NAME,UT,ID,APRICE,AVOLUME)进行瘦化。 因为此类数据是由bitcoincharts.com发布的。
实际上,通过这样的命令执行从事务级别到“ 1秒”级别的抽取(操作员很容易转换为从“ 1秒”级别到较高级别的抽取):

在Oracle上:

select 1 as STRIPE_ID , STOCK_NAME , TRUNC_UT (UT, 1) as UT , avg (APRICE) keep (dense_rank first order by UT, ID) as AOPEN , max (APRICE) as AHIGH , min (APRICE) as ALOW , avg (APRICE) keep (dense_rank last order by UT, ID) as ACLOSE , sum (AVOLUME) as AVOLUME , sum (APRICE * AVOLUME) as AAMOUNT , count (*) as ACOUNT from TRANSACTIONS_RAW group by STOCK_NAME, TRUNC_UT (UT, 1); 

函数avg()keep(按UT,ID,dense_rank为第一顺序)的工作方式如下:由于请求为GROUP BY,因此每个组的计算均独立于其他组。 在每个组中,字符串按UT和ID排序,以density_rank编号。 由于遵循第一个函数,因此选择了稠密等级返回1(换句话说,选择了最小值)的行-选择了该间隔内的第一个事务。 对于此最小的UT,ID,如果有多行,则将考虑平均值。 但是在我们的例子中,由于ID的唯一性,将保证有一行,因此结果值将立即作为AOPEN返回。 很容易注意到,第一个函数替换了两个聚合函数。

在MS SQL

没有first / last函数(有first_value / last_value ,但不是)。 因此,您必须将表连接到自身。

我不会单独给出请求,但是您可以在dbo.THINNING_HABR_CALC过程中看到它。 当然,没有第一个/最后一个,它不是那么优雅,但是会起作用。

一个操作员怎么解决这个问题? (在这里,术语“一个运算符”并不意味着该运算符将是一个运算符,而是意味着不会有循环将数据“拉”到一行上。)

我将列出我知道的解决此问题的所有选项:

  1. SIMP(简单,简单的笛卡尔积),
  2. CALC(计算,上层的迭代变薄),
  3. CHIN(中国方式,一次又一次要求所有级别),
  4. UDAF(用户定义的汇总函数),
  5. PPTF(流水线和并行表函数,过程解决方案,但只有两个游标,实际上是两个SQL语句),
  6. MODE(模型,词组为MODEL),
  7. 和IDEA(理想的理想解决方案,现在可能无法使用)。

展望未来,我要说的是,程序PPTF解决方案在Oracle上最有效的情况很少见。

http://api.bitcoincharts.com/v1/csv下载交易文件
我建议选择kraken *文件。 localbtc *文件非常嘈杂-它们包含分散注意力的行和不切实际的价格。 所有kraken *都包含大约3100万笔交易,我建议从那里排除krakenEUR,这样交易就变成1100万笔。 这是最方便的测试量。

在Powershell中运行脚本以为Oracle的SQLLDR生成控制文件,并为MSSQL生成导入请求。

  # MODIFY PARAMETERS THERE $OracleConnectString = "THINNING/aaa@P-ORA11/ORCL" # For Oracle $PathToCSV = "Z:\10" # without trailing slash $filenames = Get-ChildItem -name *.csv Remove-Item *.ctl -ErrorAction SilentlyContinue Remove-Item *.log -ErrorAction SilentlyContinue Remove-Item *.bad -ErrorAction SilentlyContinue Remove-Item *.dsc -ErrorAction SilentlyContinue Remove-Item LoadData-Oracle.bat -ErrorAction SilentlyContinue Remove-Item LoadData-MSSQL.sql -ErrorAction SilentlyContinue ForEach ($FilenameExt in $Filenames) { Write-Host "Processing file: "$FilenameExt $StockName = $FilenameExt.substring(1, $FilenameExt.Length-5) $FilenameCtl = '.'+$Stockname+'.ctl' Add-Content -Path $FilenameCtl -Value "OPTIONS (DIRECT=TRUE, PARALLEL=FALSE, ROWS=1000000, SKIP_INDEX_MAINTENANCE=Y)" Add-Content -Path $FilenameCtl -Value "UNRECOVERABLE" Add-Content -Path $FilenameCtl -Value "LOAD DATA" Add-Content -Path $FilenameCtl -Value "INFILE '.$StockName.csv'" Add-Content -Path $FilenameCtl -Value "BADFILE '.$StockName.bad'" Add-Content -Path $FilenameCtl -Value "DISCARDFILE '.$StockName.dsc'" Add-Content -Path $FilenameCtl -Value "INTO TABLE TRANSACTIONS_RAW" Add-Content -Path $FilenameCtl -Value "APPEND" Add-Content -Path $FilenameCtl -Value "FIELDS TERMINATED BY ','" Add-Content -Path $FilenameCtl -Value "(ID SEQUENCE (0), STOCK_NAME constant '$StockName', UT, APRICE, AVOLUME)" Add-Content -Path LoadData-Oracle.bat -Value "sqlldr $OracleConnectString control=$FilenameCtl" Add-Content -Path LoadData-MSSQL.sql -Value "insert into TRANSACTIONS_RAW (STOCK_NAME, UT, APRICE, AVOLUME)" Add-Content -Path LoadData-MSSQL.sql -Value "select '$StockName' as STOCK_NAME, UT, APRICE, AVOLUME" Add-Content -Path LoadData-MSSQL.sql -Value "from openrowset (bulk '$PathToCSV\$FilenameExt', formatfile = '$PathToCSV\format_mssql.bcp') as T1;" Add-Content -Path LoadData-MSSQL.sql -Value "" } 

让我们在Oracle上创建一个事务表。

 create table TRANSACTIONS_RAW ( ID number not null , STOCK_NAME varchar2 (32) , UT number not null , APRICE number not null , AVOLUME number not null) pctfree 0 parallel 4 nologging; 

在Oracle上,运行LoadData-Oracle.bat文件 ,之前已在Powershell脚本的开头固定了连接参数。

我在虚拟机上工作。 下载8个kraken *文件中的所有1100万个交易文件(我跳过了EUR文件)大约花费了1分钟。

并创建将日期截断为区间边界的函数:

 create or replace function TRUNC_UT (p_UT number, p_StripeTypeId number) return number deterministic is begin return case p_StripeTypeId when 1 then trunc (p_UT / 1) * 1 when 2 then trunc (p_UT / 10) * 10 when 3 then trunc (p_UT / 60) * 60 when 4 then trunc (p_UT / 600) * 600 when 5 then trunc (p_UT / 3600) * 3600 when 6 then trunc (p_UT / ( 4 * 3600)) * ( 4 * 3600) when 7 then trunc (p_UT / (24 * 3600)) * (24 * 3600) when 8 then trunc ((trunc (date '1970-01-01' + p_UT / 86400, 'Month') - date '1970-01-01') * 86400) when 9 then trunc ((trunc (date '1970-01-01' + p_UT / 86400, 'year') - date '1970-01-01') * 86400) when 10 then 0 when 11 then 0 end; end; create or replace function UT2DATESTR (p_UT number) return varchar2 deterministic is begin return to_char (date '1970-01-01' + p_UT / 86400, 'YYYY.MM.DD HH24:MI:SS'); end; 

考虑选项。 首先,给出所有选项的代码,然后给出启动和测试的脚本。 首先,针对Oracle描述任务,然后针对MS SQL描述任务

选项1-SIMP(特惠)


整个交易集与笛卡尔乘积乘以一组10行,数字从1到10。这对于从交易的单行中获得10行,日期被截断为10个间隔的边界是必需的。

之后,将行按间隔号和截短日期分组,并执行上述请求。

创建视图:

 create or replace view THINNING_HABR_SIMP_V as select STRIPE_ID , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID) as UT , avg (APRICE) keep (dense_rank first order by UT, ID) as AOPEN , max (APRICE) as AHIGH , min (APRICE) as ALOW , avg (APRICE) keep (dense_rank last order by UT, ID) as ACLOSE , sum (AVOLUME) as AVOLUME , sum (APRICE * AVOLUME) as AAMOUNT , count (*) as ACOUNT from TRANSACTIONS_RAW , (select rownum as STRIPE_ID from dual connect by level <= 10) group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID); 

选项2-CALC(迭代计算)


在此选项中,我们迭代地将事务从级别1精简到级别1到级别2,依此类推。

创建一个表:

 create table QUOTES_CALC ( STRIPE_ID number not null , STOCK_NAME varchar2 (128) not null , UT number not null , AOPEN number not null , AHIGH number not null , ALOW number not null , ACLOSE number not null , AVOLUME number not null , AAMOUNT number not null , ACOUNT number not null ) /*partition by list (STRIPE_ID) ( partition P01 values (1) , partition P02 values (2) , partition P03 values (3) , partition P04 values (4) , partition P05 values (5) , partition P06 values (6) , partition P07 values (7) , partition P08 values (8) , partition P09 values (9) , partition P10 values (10) )*/ parallel 4 pctfree 0 nologging; 

您可以使用STRIPE_ID字段来创建索引,但是通过实验已经确定,对于没有索引的1100万笔交易,它更有利可图。 对于较大数量,情况可能会发生变化。 或者,您可以通过取消注释查询中的块来对表进行分区。

创建一个过程:

 create or replace procedure THINNING_HABR_CALC_T is begin rollback; execute immediate 'truncate table QUOTES_CALC'; insert --+ append into QUOTES_CALC select 1 as STRIPE_ID , STOCK_NAME , UT , avg (APRICE) keep (dense_rank first order by ID) , max (APRICE) , min (APRICE) , avg (APRICE) keep (dense_rank last order by ID) , sum (AVOLUME) , sum (APRICE * AVOLUME) , count (*) from TRANSACTIONS_RAW a group by STOCK_NAME, UT; commit; for i in 1..9 loop insert --+ append into QUOTES_CALC select --+ parallel(4) STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, i + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from QUOTES_CALC a where STRIPE_ID = i group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, i + 1); commit; end loop; end; / 

为了对称,请创建一个简单的VIEW:

 create view THINNING_HABR_CALC_V as select * from QUOTES_CALC; 

选项3-CHIN(中文代码)


该方法不同于该方法的残酷直截了当,是对“不要重复自己”原则的拒绝。 在这种情况下,拒绝循环。

仅在完整性方面提供此选项。

展望未来,我要说的是,在这项特定任务的绩效方面,它位居第二。

大要求
 create or replace view THINNING_HABR_CHIN_V as with T01 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select 1 , STOCK_NAME , UT , avg (APRICE) keep (dense_rank first order by ID) , max (APRICE) , min (APRICE) , avg (APRICE) keep (dense_rank last order by ID) , sum (AVOLUME) , sum (APRICE * AVOLUME) , count (*) from TRANSACTIONS_RAW group by STOCK_NAME, UT) , T02 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T01 group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1)) , T03 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T02 group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1)) , T04 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T03 group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1)) , T05 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T04 group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1)) , T06 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T05 group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1)) , T07 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T06 group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1)) , T08 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T07 group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1)) , T09 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T08 group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1)) , T10 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T09 group by STRIPE_ID, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1)) select * from T01 union all select * from T02 union all select * from T03 union all select * from T04 union all select * from T05 union all select * from T06 union all select * from T07 union all select * from T08 union all select * from T09 union all select * from T10; 


选项4-UDAF


此处将不提供带有用户定义的聚合函数的选项,但可以在github上查看。

选项5-PPTF(流水线和并行表功能)


创建一个函数(在包中):

 create or replace package THINNING_PPTF_P is type TRANSACTION_RECORD_T is record (STOCK_NAME varchar2(128), UT number, SEQ_NUM number, APRICE number, AVOLUME number); type CUR_RECORD_T is ref cursor return TRANSACTION_RECORD_T; type QUOTE_T is record (STRIPE_ID number, STOCK_NAME varchar2(128), UT number , AOPEN number, AHIGH number, ALOW number, ACLOSE number, AVOLUME number , AAMOUNT number, ACOUNT number); type QUOTE_LIST_T is table of QUOTE_T; function F (p_cursor CUR_RECORD_T) return QUOTE_LIST_T pipelined order p_cursor by (STOCK_NAME, UT, SEQ_NUM) parallel_enable (partition p_cursor by hash (STOCK_NAME)); end; / create or replace package body THINNING_PPTF_P is function F (p_cursor CUR_RECORD_T) return QUOTE_LIST_T pipelined order p_cursor by (STOCK_NAME, UT, SEQ_NUM) parallel_enable (partition p_cursor by hash (STOCK_NAME)) is QuoteTail QUOTE_LIST_T := QUOTE_LIST_T() ; rec TRANSACTION_RECORD_T; rec_prev TRANSACTION_RECORD_T; type ut_T is table of number index by pls_integer; ut number; begin QuoteTail.extend(10); loop fetch p_cursor into rec; exit when p_cursor%notfound; if rec_prev.STOCK_NAME = rec.STOCK_NAME then if (rec.STOCK_NAME = rec_prev.STOCK_NAME and rec.UT < rec_prev.UT) or (rec.STOCK_NAME = rec_prev.STOCK_NAME and rec.UT = rec_prev.UT and rec.SEQ_NUM < rec_prev.SEQ_NUM) then raise_application_error (-20010, 'Rowset must be ordered, ('||rec_prev.STOCK_NAME||','||rec_prev.UT||','||rec_prev.SEQ_NUM||') > ('||rec.STOCK_NAME||','||rec.UT||','||rec.SEQ_NUM||')'); end if; end if; if rec.STOCK_NAME <> rec_prev.STOCK_NAME or rec_prev.STOCK_NAME is null then for j in 1 .. 10 loop if QuoteTail(j).UT is not null then pipe row (QuoteTail(j)); QuoteTail(j) := null; end if; end loop; end if; for i in reverse 1..10 loop ut := TRUNC_UT (rec.UT, i); if QuoteTail(i).UT <> ut then for j in 1..i loop pipe row (QuoteTail(j)); QuoteTail(j) := null; end loop; end if; if QuoteTail(i).UT is null then QuoteTail(i).STRIPE_ID := i; QuoteTail(i).STOCK_NAME := rec.STOCK_NAME; QuoteTail(i).UT := ut; QuoteTail(i).AOPEN := rec.APRICE; end if; if rec.APRICE < QuoteTail(i).ALOW or QuoteTail(i).ALOW is null then QuoteTail(i).ALOW := rec.APRICE; end if; if rec.APRICE > QuoteTail(i).AHIGH or QuoteTail(i).AHIGH is null then QuoteTail(i).AHIGH := rec.APRICE; end if; QuoteTail(i).AVOLUME := nvl (QuoteTail(i).AVOLUME, 0) + rec.AVOLUME; QuoteTail(i).AAMOUNT := nvl (QuoteTail(i).AAMOUNT, 0) + rec.AVOLUME * rec.APRICE; QuoteTail(i).ACOUNT := nvl (QuoteTail(i).ACOUNT, 0) + 1; QuoteTail(i).ACLOSE := rec.APRICE; end loop; rec_prev := rec; end loop; for j in 1 .. 10 loop if QuoteTail(j).UT is not null then pipe row (QuoteTail(j)); end if; end loop; exception when no_data_needed then null; end; end; / 

创建视图:

 create or replace view THINNING_HABR_PPTF_V as select * from table (THINNING_PPTF_P.F (cursor (select STOCK_NAME, UT, ID, APRICE, AVOLUME from TRANSACTIONS_RAW))); 

选项6-MODE(模型子句)


该选项使用带有短语ITERATEMODEL子句短语迭代计算所有10个级别的抽取。

该选项也是不切实际的,因为它很慢。 在我的环境中,一分钟内就计算出8种工具的1000笔交易。 大部分时间都花在计算MODEL短语上。

在此,我仅出于完整性的目的给出此选项,并确认以下事实:在Oracle上,几乎所有任意复杂的计算都可以通过一个查询执行,而无需使用PL / SQL。

该查询中MODEL短语的性能低的原因之一是,对我们拥有6条的每个规则都执行了右边的搜索。前两个规则的计算非常快,因为有直接的显式寻址,没有玩笑话。 在剩余的四个规则中,有任何一个词-计算速度较慢。

第二个困难是您必须计算参考模型。 这是必需的,因为计算MODEL短语之前必须知道维列表,我们无法在此短语内计算新维。 也许可以借助两个MODEL短语来规避此问题,但是由于大量规则的性能不佳,我没有这样做。

我补充说,有可能不计算参考模型中的UT_OPENUT_CLOSE ,而是直接在MODEL短语中使用相同的函数avg()keep(dense_rank first / last order by) 。 但这会更加缓慢地发生。
由于性能限制,我不会在测试过程中包括此选项。

 with --       SOURCETRANS (STRIPE_ID, STOCK_NAME, PARENT_UT, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select 1, STOCK_NAME, TRUNC_UT (UT, 2), UT , avg (APRICE) keep (dense_rank first order by ID) , max (APRICE) , min (APRICE) , avg (APRICE) keep (dense_rank last order by ID) , sum (AVOLUME) , sum (AVOLUME * APRICE) , count (*) from TRANSACTIONS_RAW where ID <= 1000 --       group by STOCK_NAME, UT) --   PARENT_UT, UT  2...10    UT_OPEN, UT_CLOSE --    , REFMOD (STRIPE_ID, STOCK_NAME, PARENT_UT, UT, UT_OPEN, UT_CLOSE) as (select b.STRIPE_ID , a.STOCK_NAME , TRUNC_UT (UT, b.STRIPE_ID + 1) , TRUNC_UT (UT, b.STRIPE_ID) , min (TRUNC_UT (UT, b.STRIPE_ID - 1)) , max (TRUNC_UT (UT, b.STRIPE_ID - 1)) from SOURCETRANS a , (select rownum + 1 as STRIPE_ID from dual connect by level <= 9) b group by b.STRIPE_ID , a.STOCK_NAME , TRUNC_UT (UT, b.STRIPE_ID + 1) , TRUNC_UT (UT, b.STRIPE_ID)) --        , MAINTAB as ( select STRIPE_ID, STOCK_NAME, PARENT_UT, UT, AOPEN , AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT, null, null from SOURCETRANS union all select STRIPE_ID, STOCK_NAME, PARENT_UT, UT, null , null, null, null, null, null, null, UT_OPEN, UT_CLOSE from REFMOD) select STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT from MAINTAB model return all rows --      2...10 reference RM on (select * from REFMOD) dimension by (STRIPE_ID, STOCK_NAME, UT) measures (UT_OPEN, UT_CLOSE) main MM partition by (STOCK_NAME) dimension by (STRIPE_ID, PARENT_UT, UT) measures (AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) rules iterate (9) ( AOPEN [iteration_number + 2, any, any] = AOPEN [cv (STRIPE_ID) - 1, cv (UT) , rm.UT_OPEN [cv (STRIPE_ID), cv (STOCK_NAME), cv (UT)]] , ACLOSE [iteration_number + 2, any, any] = ACLOSE [cv (STRIPE_ID) - 1, cv (UT) , rm.UT_CLOSE[cv (STRIPE_ID), cv (STOCK_NAME), cv (UT)]] , AHIGH [iteration_number + 2, any, any] = max (AHIGH)[cv (STRIPE_ID) - 1, cv (UT), any] , ALOW [iteration_number + 2, any, any] = min (ALOW)[cv (STRIPE_ID) - 1, cv (UT), any] , AVOLUME [iteration_number + 2, any, any] = sum (AVOLUME)[cv (STRIPE_ID) - 1, cv (UT), any] , AAMOUNT [iteration_number + 2, any, any] = sum (AAMOUNT)[cv (STRIPE_ID) - 1, cv (UT), any] , ACOUNT [iteration_number + 2, any, any] = sum (ACOUNT)[cv (STRIPE_ID) - 1, cv (UT), any] ) order by 1, 2, 3, 4; 

选项6-IDEA(理想,理想但不起作用)


以下所述的请求可能是最有效的,并且消耗的资源量等于理论上的最小值。

但是Oracle和MS SQL都不允许您以这种形式编写查询。 我相信这是由标准决定的。

 with QUOTES_S1 as (select 1 as STRIPE_ID , STOCK_NAME , TRUNC_UT (UT, 1) as UT , avg (APRICE) keep (dense_rank first order by ID) as AOPEN , max (APRICE) as AHIGH , min (APRICE) as ALOW , avg (APRICE) keep (dense_rank last order by ID) as ACLOSE , sum (AVOLUME) as AVOLUME , sum (APRICE * AVOLUME) as AAMOUNT , count (*) as ACOUNT from TRANSACTIONS_RAW -- where rownum <= 100 group by STOCK_NAME, TRUNC_UT (UT, 1)) , T1 (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) as (select 1, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT from QUOTES_S1 union all select STRIPE_ID + 1 , STOCK_NAME , TRUNC_UT (UT, STRIPE_ID + 1) , avg (AOPEN) keep (dense_rank first order by UT) , max (AHIGH) , min (ALOW) , avg (ACLOSE) keep (dense_rank last order by UT) , sum (AVOLUME) , sum (AAMOUNT) , sum (ACOUNT) from T1 where STRIPE_ID < 10 group by STRIPE_ID + 1, STOCK_NAME, TRUNC_UT (UT, STRIPE_ID + 1) ) select * from T1 

此查询对应于Oracle文档的以下部分:

如果subquery_factoring_clause在定义它的子查询中引用了它自己的query_name,则可以说subquery_factoring_clause是递归的。 递归subquery_factoring_clause必须包含两个查询块:第一个是锚成员,第二个是递归成员。 锚成员必须出现在递归成员之前,并且它不能引用query_name。 锚成员可以由一个或多个由集合运算符组合的查询块组成:UNION ALL,UNION,INTERSECT或MINUS。 递归成员必须跟随锚成员,并且必须只引用一次query_name。 您必须使用UNION ALL set运算符将递归成员与锚成员结合在一起。

但这与文档的以下段落矛盾:

递归成员不能包含以下任何元素:
DISTINCT关键字或GROUP BY子句
聚合函数。 但是,选择列表中允许使用分析功能。

因此,在递归成员中,不允许进行聚合和分组。

测试中



让我们首先为Oracle做它。

执行CALC方法的计算过程,并记录其执行时间:

 exec THINNING_HABR_CALC_T 

四种方法的计算结果在四个视图中:

  • THINNING_HABR_SIMP_V(将执行计算,导致复杂的SELECT,因此将需要很长时间),
  • THINNING_HABR_CALC_V(将显示QUOTES_CALC表中的数据,因此将快速执行)
  • THINNING_HABR_CHIN_V(还将执行计算,导致复杂的SELECT,因此将需要很长时间),
  • THINNING_HABR_PPTF_V(将执行功能THINNING_HABR_PPTF)。

我已经测量了所有方法的交货时间,并在文章末尾的表格中给出了交货时间。

对于其余的VIEW,我们执行请求并写出执行时间:

 select count (*) as CNT , sum (STRIPE_ID) as S_STRIPE_ID, sum (UT) as S_UT , sum (AOPEN) as S_AOPEN, sum (AHIGH) as S_AHIGH, sum (ALOW) as S_ALOW , sum (ACLOSE) as S_ACLOSE, sum (AVOLUME) as S_AVOLUME , sum (AAMOUNT) as S_AAMOUNT, sum (ACOUNT) as S_ACOUNT from THINNING_HABR_XXXX_V 

其中XXXX是SIMP,CHIN,PPTF。

这些VIEW计算招聘摘要。 要计算摘要,您需要获取所有行,并使用摘要可以将集合相互比较。

您还可以使用dbms_sqlhash软件包比较集,但这要慢得多,因为您需要对原始集进行排序,并且哈希计算并不快。
同样在12c中,有一个包DBMS_COMPARISON。

您可以同时检查所有算法的正确性。 我们认为摘要是这样的请求(在虚拟机中有11M条目,这将相对较长,大约15分钟):

 with T1 as (select 'SIMP' as ALG_NAME, a.* from THINNING_HABR_SIMP_V a union all select 'CALC', a.* from THINNING_HABR_CALC_V a union all select 'CHIN', a.* from THINNING_HABR_CHIN_V a union all select 'PPTF', a.* from THINNING_HABR_PPTF_V a) select ALG_NAME , count (*) as CNT , sum (STRIPE_ID) as S_STRIPE_ID, sum (UT) as S_UT , sum (AOPEN) as S_AOPEN, sum (AHIGH) as S_AHIGH, sum (ALOW) as S_ALOW , sum (ACLOSE) as S_ACLOSE, sum (AVOLUME) as S_AVOLUME , sum (AAMOUNT) as S_AAMOUNT, sum (ACOUNT) as S_ACOUNT from T1 group by ALG_NAME; 

我们看到摘要是相同的,因此所有算法给出的结果都是相同的。

现在,我们将在MS SQL中重现所有相同内容。 我在2016版上进行了测试。

首先创建DBTEST数据库,然后在其中创建一个事务表:

 use DBTEST go create table TRANSACTIONS_RAW ( STOCK_NAME varchar (32) not null , UT int not null , APRICE numeric (22, 12) not null , AVOLUME numeric (22, 12) not null , ID bigint identity not null ); 

下载下载的数据。

在MSSQL中,创建文件format_mssql.bcp:

 12.0 3 1 SQLCHAR 0 0 "," 3 UT "" 2 SQLCHAR 0 0 "," 4 APRICE "" 3 SQLCHAR 0 0 "\n" 5 AVOLUME "" 

并在SSMS中运行LoadData-MSSQL.sql脚本(此脚本由本文部分中针对Oracle的唯一Powershell脚本生成)。

让我们创建两个函数:

 use DBTEST go create or alter function TRUNC_UT (@p_UT bigint, @p_StripeTypeId int) returns bigint as begin return case @p_StripeTypeId when 1 then @p_UT when 2 then @p_UT / 10 * 10 when 3 then @p_UT / 60 * 60 when 4 then @p_UT / 600 * 600 when 5 then @p_UT / 3600 * 3600 when 6 then @p_UT / 14400 * 14400 when 7 then @p_UT / 86400 * 86400 when 8 then datediff (second, cast ('1970-01-01 00:00:00' as datetime), dateadd(m, datediff (m, 0, dateadd (second, @p_UT, cast ('1970-01-01 00:00:00' as datetime))), 0)) when 9 then datediff (second, cast ('1970-01-01 00:00:00' as datetime), dateadd(yy, datediff (yy, 0, dateadd (second, @p_UT, cast ('1970-01-01 00:00:00' as datetime))), 0)) when 10 then 0 when 11 then 0 end; end; go create or alter function UT2DATESTR (@p_UT bigint) returns datetime as begin return dateadd(s, @p_UT, cast ('1970-01-01 00:00:00' as datetime)); end; go 

我们继续执行这些选项:

选项1-SIMP


运行:

 use DBTEST go create or alter view dbo.THINNING_HABR_SIMP_V as with T1 (STRIPE_ID) as (select 1 union all select STRIPE_ID + 1 from T1 where STRIPE_ID < 10) , T2 as (select STRIPE_ID , STOCK_NAME , dbo.TRUNC_UT (UT, STRIPE_ID) as UT , min (1000000 * cast (UT as bigint) + ID) as AOPEN_UT , max (APRICE) as AHIGH , min (APRICE) as ALOW , max (1000000 * cast (UT as bigint) + ID) as ACLOSE_UT , sum (AVOLUME) as AVOLUME , sum (APRICE * AVOLUME) as AAMOUNT , count (*) as ACOUNT from TRANSACTIONS_RAW, T1 group by STRIPE_ID, STOCK_NAME, dbo.TRUNC_UT (UT, STRIPE_ID)) select t.STRIPE_ID, t.STOCK_NAME, t.UT, t_op.APRICE as AOPEN, t.AHIGH , t.ALOW, t_cl.APRICE as ACLOSE, t.AVOLUME, t.AAMOUNT, t.ACOUNT from T2 t join TRANSACTIONS_RAW t_op on (t.STOCK_NAME = t_op.STOCK_NAME and t.AOPEN_UT / 1000000 = t_op.UT and t.AOPEN_UT % 1000000 = t_op.ID) join TRANSACTIONS_RAW t_cl on (t.STOCK_NAME = t_cl.STOCK_NAME and t.ACLOSE_UT / 1000000 = t_cl.UT and t.ACLOSE_UT % 1000000 = t_cl.ID); 

缺少的第一个/最后一个功能是通过双表自连接实现的。

选项2-CALC


创建一个表,过程和视图:

 use DBTEST go create table dbo.QUOTES_CALC ( STRIPE_ID int not null , STOCK_NAME varchar(32) not null , UT bigint not null , AOPEN numeric (22, 12) not null , AHIGH numeric (22, 12) not null , ALOW numeric (22, 12) not null , ACLOSE numeric (22, 12) not null , AVOLUME numeric (38, 12) not null , AAMOUNT numeric (38, 12) not null , ACOUNT int not null ); go create or alter procedure dbo.THINNING_HABR_CALC as begin set nocount on; truncate table QUOTES_CALC; declare @StripeId int; with T1 as (select STOCK_NAME , UT , min (ID) as AOPEN_ID , max (APRICE) as AHIGH , min (APRICE) as ALOW , max (ID) as ACLOSE_ID , sum (AVOLUME) as AVOLUME , sum (APRICE * AVOLUME) as AAMOUNT , count (*) as ACOUNT from TRANSACTIONS_RAW group by STOCK_NAME, UT) insert into QUOTES_CALC (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) select 1, t.STOCK_NAME, t.UT, t_op.APRICE, t.AHIGH, t.ALOW, t_cl.APRICE, t.AVOLUME, t.AAMOUNT, t.ACOUNT from T1 t join TRANSACTIONS_RAW t_op on (t.STOCK_NAME = t_op.STOCK_NAME and t.UT = t_op.UT and t.AOPEN_ID = t_op.ID) join TRANSACTIONS_RAW t_cl on (t.STOCK_NAME = t_cl.STOCK_NAME and t.UT = t_cl.UT and t.ACLOSE_ID = t_cl.ID); set @StripeId = 1; while (@StripeId <= 9) begin with T1 as (select STOCK_NAME , dbo.TRUNC_UT (UT, @StripeId + 1) as UT , min (UT) as AOPEN_UT , max (AHIGH) as AHIGH , min (ALOW) as ALOW , max (UT) as ACLOSE_UT , sum (AVOLUME) as AVOLUME , sum (AAMOUNT) as AAMOUNT , sum (ACOUNT) as ACOUNT from QUOTES_CALC where STRIPE_ID = @StripeId group by STOCK_NAME, dbo.TRUNC_UT (UT, @StripeId + 1)) insert into QUOTES_CALC (STRIPE_ID, STOCK_NAME, UT, AOPEN, AHIGH, ALOW, ACLOSE, AVOLUME, AAMOUNT, ACOUNT) select @StripeId + 1, t.STOCK_NAME, t.UT, t_op.AOPEN, t.AHIGH, t.ALOW, t_cl.ACLOSE, t.AVOLUME, t.AAMOUNT, t.ACOUNT from T1 t join QUOTES_CALC t_op on (t.STOCK_NAME = t_op.STOCK_NAME and t.AOPEN_UT = t_op.UT) join QUOTES_CALC t_cl on (t.STOCK_NAME = t_cl.STOCK_NAME and t.ACLOSE_UT = t_cl.UT) where t_op.STRIPE_ID = @StripeId and t_cl.STRIPE_ID = @StripeId; set @StripeId = @StripeId + 1; end; end; go create or alter view dbo.THINNING_HABR_CALC_V as select * from dbo.QUOTES_CALC; go 

我没有在MS SQL上实现选项3(CHIN)和4(UDAF)。

选项5-PPTF


创建一个表函数并查看。该函数只是一个表函数,而不是并行管道表函数,只是该选项保留了Oracle的历史名称:

 use DBTEST go create or alter function dbo.THINNING_HABR_PPTF () returns @rettab table ( STRIPE_ID bigint not null , STOCK_NAME varchar(32) not null , UT bigint not null , AOPEN numeric (22, 12) not null , AHIGH numeric (22, 12) not null , ALOW numeric (22, 12) not null , ACLOSE numeric (22, 12) not null , AVOLUME numeric (38, 12) not null , AAMOUNT numeric (38, 12) not null , ACOUNT bigint not null) as begin declare @i tinyint; declare @tut int; declare @trans_STOCK_NAME varchar(32); declare @trans_UT int; declare @trans_ID int; declare @trans_APRICE numeric (22,12); declare @trans_AVOLUME numeric (22,12); declare @trans_prev_STOCK_NAME varchar(32); declare @trans_prev_UT int; declare @trans_prev_ID int; declare @trans_prev_APRICE numeric (22,12); declare @trans_prev_AVOLUME numeric (22,12); declare @QuoteTail table ( STRIPE_ID bigint not null primary key clustered , STOCK_NAME varchar(32) not null , UT bigint not null , AOPEN numeric (22, 12) not null , AHIGH numeric (22, 12) , ALOW numeric (22, 12) , ACLOSE numeric (22, 12) , AVOLUME numeric (38, 12) not null , AAMOUNT numeric (38, 12) not null , ACOUNT bigint not null); declare c cursor fast_forward for select STOCK_NAME, UT, ID, APRICE, AVOLUME from TRANSACTIONS_RAW order by STOCK_NAME, UT, ID; -- THIS ORDERING (STOCK_NAME, UT, ID) IS MANDATORY open c; fetch next from c into @trans_STOCK_NAME, @trans_UT, @trans_ID, @trans_APRICE, @trans_AVOLUME; while @@fetch_status = 0 begin if @trans_STOCK_NAME <> @trans_prev_STOCK_NAME or @trans_prev_STOCK_NAME is null begin insert into @rettab select * from @QuoteTail; delete @QuoteTail; end; set @i = 10; while @i >= 1 begin set @tut = dbo.TRUNC_UT (@trans_UT, @i); if @tut <> (select UT from @QuoteTail where STRIPE_ID = @i) begin insert into @rettab select * from @QuoteTail where STRIPE_ID <= @i; delete @QuoteTail where STRIPE_ID <= @i; end; if (select count (*) from @QuoteTail where STRIPE_ID = @i) = 0 begin insert into @QuoteTail (STRIPE_ID, STOCK_NAME, UT, AOPEN, AVOLUME, AAMOUNT, ACOUNT) values (@i, @trans_STOCK_NAME, @tut, @trans_APRICE, 0, 0, 0); end; update @QuoteTail set AHIGH = case when AHIGH < @trans_APRICE or AHIGH is null then @trans_APRICE else AHIGH end , ALOW = case when ALOW > @trans_APRICE or ALOW is null then @trans_APRICE else ALOW end , ACLOSE = @trans_APRICE, AVOLUME = AVOLUME + @trans_AVOLUME , AAMOUNT = AAMOUNT + @trans_APRICE * @trans_AVOLUME , ACOUNT = ACOUNT + 1 where STRIPE_ID = @i; set @i = @i - 1; end; set @trans_prev_STOCK_NAME = @trans_STOCK_NAME; set @trans_prev_UT = @trans_UT; set @trans_prev_ID = @trans_ID; set @trans_prev_APRICE = @trans_APRICE; set @trans_prev_AVOLUME = @trans_AVOLUME; fetch next from c into @trans_STOCK_NAME, @trans_UT, @trans_ID, @trans_APRICE, @trans_AVOLUME; end; close c; deallocate c; insert into @rettab select * from @QuoteTail; return; end go create or alter view dbo.THINNING_HABR_PPTF_V as select * from dbo.THINNING_HABR_PPTF (); 

让我们为CALC方法计算QUOTES_CALC表并写出执行时间:
 use DBTEST go exec dbo.THINNING_HABR_CALC 

这三种方法的计算结果在以下三个视图中:

  • THINNING_HABR_SIMP_V(将执行计算,导致复杂的SELECT,因此将需要很长时间),
  • THINNING_HABR_CALC_V(将显示QUOTES_CALC表中的数据,因此将快速执行)
  • THINNING_HABR_PPTF_V(将执行功能THINNING_HABR_PPTF)。

对于两个VIEW,我们执行请求并写出执行时间:

 select count (*) as CNT , sum (STRIPE_ID) as S_STRIPE_ID, sum (UT) as S_UT , sum (AOPEN) as S_AOPEN, sum (AHIGH) as S_AHIGH, sum (ALOW) as S_ALOW , sum (ACLOSE) as S_ACLOSE, sum (AVOLUME) as S_AVOLUME , sum (AAMOUNT) as S_AAMOUNT, sum (ACOUNT) as S_ACOUNT from THINNING_HABR_XXXX_V 

其中XXXX是SIMP,PPTF。

现在,您可以比较MS SQL的三种方法的计算结果。可以在一个请求中完成。运行:

 use DBTEST go with T1 as (select 'SIMP' as ALG_NAME, a.* from THINNING_HABR_SIMP_V a union all select 'CALC', a.* from THINNING_HABR_CALC_V a union all select 'PPTF', a.* from THINNING_HABR_PPTF_V a) select ALG_NAME , count (*) as CNT, sum (cast (STRIPE_ID as bigint)) as STRIPE_ID , sum (cast (UT as bigint)) as UT, sum (AOPEN) as AOPEN , sum (AHIGH) as AHIGH, sum (ALOW) as ALOW, sum (ACLOSE) as ACLOSE, sum (AVOLUME) as AVOLUME , sum (AAMOUNT) as AAMOUNT, sum (cast (ACOUNT as bigint)) as ACOUNT from T1 group by ALG_NAME; 

如果三行在所有字段上重合,则使用这三种方法的计算结果将相同。

我强烈建议您在测试阶段使用少量选择,因为此任务在MS SQL中的性能较低。

如果只有MS SQL引擎并且要计算大量数据,则可以尝试以下优化方法:可以创建索引:

 create unique clustered index TRANSACTIONS_RAW_I1 on TRANSACTIONS_RAW (STOCK_NAME, UT, ID); create unique clustered index QUOTES_CALC_I1 on QUOTES_CALC (STRIPE_ID, STOCK_NAME, UT); 

在我的虚拟机上测量性能的结果如下:

图片

可以从github下载脚本:Oracle,THINNING方案-本文的脚本,THINNING_LIVE方案- 从bitcoincharts.com 在线下载数据并在线精简(但该站点仅以在线模式发送最近5天的数据),以及该脚本这篇文章也适用于MS SQL。

结论:

该任务在Oracle上比在MS SQL上解决得更快。随着交易数量的增加,差距变得越来越大。

在Oracle上,PPTF是最佳选择。在这里,程序化方法被证明更有利可图,这种情况很少发生。其他方法也显示了可接受的结果-我什至在虚拟机中测试了367M事务的数量(PPTF方法计算了一个半小时的间隔时间)。

在MS SQL上,迭代计算方法(CALC)证明是最有效的。

为什么Oracle上的PPTF方法最终成为领导者?由于并发性和体系结构,在查询计划的中间嵌入了一个作为并行管道表函数创建的函数:

图片

Source: https://habr.com/ru/post/zh-CN418757/


All Articles