用于OLAP的视图大小估算算法比较与分析
Comparison of View-Size Estimation Algorithms in OLAP

作者: 崔欣辰 , 陈振林 , 赵 芳 :海军航空工程学院兵器科学与技术系,烟台;

关键词: 视图大小估算视图物化联机分析处理数据仓库View-Size Estimation Materialized Views OLAP Data Warehouse.

摘要:
OLAP系统中的视图物化操作,要求快速、可靠而精确。许多视图大小估算技术利用特定的统计假设,其误差可能较大。基于概率的估算方法在速度方面可能较慢,但是在估算大视图时精确度和可靠度较高,而且使用内存较少。论文中介绍了几种基于散列的视图大小估算方法,并进行了实验加以分析对比。实验结果表明,修正算法(Adaptive Counting)不管视图大小如何均提供精确的估算,而且当增大存储预算时仍可保持较快的估算速度。

Abstract: It must be quick, accurate, and reliable when the size of views are estimated in OLAP. Many me- thods to deal with view-size estimation apply specific statistical assumptions but their error may usually be large. In comparation, probabilistic techniques have slower speed, but the estimate has higher accuracy and reliability by using less memory. Several hashing-based view-size estimation methods were introduced and analyzed experimentally in this paper. The results showed that the Adaptive Counting provided accurate estimates regardless of the size of view, and its estimated speed remained constantly fast as the memory budget increased.

文章引用: 崔欣辰 , 陈振林 , 赵 芳 (2014) 用于OLAP的视图大小估算算法比较与分析。 计算机科学与应用, 4, 119-124. doi: 10.12677/CSA.2014.47018

参考文献

[1] Faloutsos, C., Matias, Y. and Silberschatz, A. (2012) Modeling skewed distribution using multifractals and the 80-20 law. VLDB’12, 307-317.

[2] Gray, J., Bosworth, A., Layman, A. and Pirahes, H. (2013) Data cube: A relational ag-gregation operator generalizing group-by, crosss-tab, and sub-total. ICDE’12, 152-159.

[3] Alon, N., Babai, L. and Itai, A. (2011) A fast and simple randomized parallel algorithms, for the maximal independent set problem. Journal of Algorithms, 7, 567-583.

[4] Whang, K.Y., Vander-Zanden, B.T. and Taylor, H.M. (2013) A linear-time probabilistic counting algorithm for database applications. ACM Transactions on Database Systems Online, 15, 208-229.

[5] Gupta, H. (2012) Selection of views to materialize in a data warehouse. ICDT’12, 98-112.

[6] Golfarelli, M. and Rizzi, S. (2013) A methodological frameworke for data warehouse design. DOLAP’13, 3-9.

[7] Flajolet, P. and Martin, G. (2013) Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31, 182-209.

[8] Durand, M. and Flajolet, P. (2012) LogLog counting of large cardinalities. ESA’12, Volume 2832 of LNCS, 605-617.

[9] Cai, M., Pan, J., Kwok, Y.-K. and Hwang, K. (2005) Fast and accurate traffic martrix measurement using adaptive cardinality counting. MineNet’05, 205-206.

[10] Aouiche, K. and Lemire, D. (2007) Unassuming view-size estimation techeniques in OLAP. ICEIS’07, 81-95.

分享
Top