||
Bill Franks. Taming the Big Data TidalWave: Finding Opportunities in Huge Data Streams with Advanced Analytics, NJ,USA: John Wiley & Sons, 2012.
[中文版][美]Bill Franks著. 驾驭大数据. 黄海, 车皓阳, 王悦 等译. 北京:人民邮电出版社,2013.
开始阅读:2017-02-03
结束阅读:2017-02-09
【书籍情况】中文纸质版,英文原版。
【原文摘录】
Propensity:a strong natural tendency to do something.
Frankspoints out that much of the volume of big data isn’t useful anyway, and that it’simportant to focus on filtering out the dross data. Pxv. Dross data: unwantedmaterial.
Thesetraits include commitment, creativity, business savvy, presentation skills, andintuition. Pxxii.
Accordingto the Gartner Group, the “big” in big data also refers to several othercharacteristics of a big data source. These aspects include not just increased volumebut increased velocity and increased variety. These factors, of course, lead toextra complexity as well. P5.
Nefarious: evil or immoral. P12.
Thefact is that the decision to start taming big data shouldn’t be a big stretch.P13.
Itis necessary to come up with some ideas, even if they are small, and makesomething happen quickly. P17.
It’smore like sipping water from a hose: You slurp out just what you need and letthe rest run by. P18.
Intimidate:to make (someone) afraid. Similarly, what we are intimidated by today won’t beso scary a few years down the road. P24.
Wrap-up:summary. P26.
Creatingstructured data out of unstructured text is often called informationextraction. P58.
Sentimentanalysis looks at the general direction of opinion across a large number ofpeople to provide information on what the market is saying, thinking, andfeeling about an organization. It often uses data from social media sites. P59.
Newergames often offer in-game purchases for a small fee. These are known asmicrotransactions. P77.
Suchsingle-purpose databases are often called “data marts.” While manyorganizations still leverage data marts heavily, leading organizations now seevalue in combining the various database systems into one big system called anEnterprise Data Warehouse (EDW). P91.
Inother words, move the analysis to the data instead of moving the data to theanalysis. This is the concept of in-database analytics. P93.
MapReduceis a parallel programming framework. It’s neither a database nor a directcompetitor to database. P110.
Hadoopis the best-known implementation of the MapReduce framework. P111.
Similarly,a sandbox in the analytics context is a set of resources that enable analyticprofessionals to experiment and reshape data in whatever fashion they need to.Other terms used for the sandbox concept include an agile analytics cloud and adata lab, among others. P123.
Thewisdom of crowds. P155.
Areporting environment, as we will define it here, is also often called abusiness intelligence (BI) environment. P180.
Thereason these answers aren’t correct ties to a phrase used in mathematicalproofs: necessary, but not sufficient. P203.
Anorganization can’t win by doing the same thing it sees its top competitor successfullydo. It has to get there first. P283.
【读后感】
将这句“Web data in action”翻译成”行动中的网络数据”,确实让我气愤。
MapReduce适合于各个子任务相互独立。
该作者首先讲解了大数据的概念,为什么打数据如此重要;大数据的不同来源形式。接着,从技术、工具、方法、人、组织、创新文化等方面阐述如何驯服大数据。工具、技术等仅仅是泛泛的讲解,却很通俗易懂。作者不仅仅停留在技术概念的讲解上,而且扩展到如何成为一个优秀的分析专家、如何打造一个优秀的分析团队以及如何在组织中构造创新发现的氛围。作者强调,要重视大数据的分析处理,跟上这一波潮流,不要仅仅局限于技术和工具,而是更加重视使用工具的人,亦即分析专家。更为可喜可贺的是,作者讲解了大数据技术MPP、MapReduce、Grid computing、及开源软件R,指出R的使用范围。这算是一本技术科普书籍,读后深受教益。
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-12-4 08:24
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社