xiaohai2008的个人博客分享 http://blog.sciencenet.cn/u/xiaohai2008

博文

Mining Frequent & Maximal Reference Sequences with GST

已有 2868 次阅读 2011-10-18 13:59 |个人分类:日志挖掘|系统分类:论文交流| Web, Mining, Usage, Generalized, Suffix

Web usage mining (WUM) is the type of Web mining activity that involves the automatic discovery of user access patterns from huge Web access logs. In this study, we analyze deeply generalized suffix tree data structure in WUM situations and explain in detail the reasons why a linear-time traversal on the generalized suffix tree can obtain frequent reference sequences. The key point is that due to the special nature of transactions, for each internal node v, the total number of leaves in the sub-tree of v is exactly the number of distinct (navigation-content) transaction identifiers that appear at the leaves in the sub-tree of v. After that, with the help of generalized suffix tree, an algorithm on mining maximal reference sequences is proposed. Experimental results indicate that our approach is feasible and has good scalability.

原文见:2010_6_7_2187_2197.pdf


https://wap.sciencenet.cn/blog-611051-498112.html

上一篇:开博了
下一篇:基于双序列比对的中文术语语义相似度计算的新方法
收藏 IP: 168.160.25.*| 热度|

0

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-21 18:10

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部