武夷山分享 http://blog.sciencenet.cn/u/Wuyishan 中国科学技术发展战略研究院研究员;南京大学信息管理系博导

博文

关于大数据的另一本重要著作:《预测型数据分析术》

已有 8890 次阅读 2014-1-2 06:37 |个人分类:书评书介|系统分类:科研笔记

关于大数据的另一本重要著作:《预测型数据分析术》

武夷山

 

Wiley出版社2013年2月推出了Eric Siegel的著作《预测型数据分析术:关于谁会点击、谁会购买、谁会撒谎、谁会死亡的强大预测力》。Eric Siegel曾为美国哥伦比亚大学自然语言处理专业的助理教授,后来创办了“预测型数据分析术世界”,全力研究和推广数据分析术。本文后面是该书目录,我将各章标题译为中文了。由于没有读到原书,对各章标题的理解与翻译也许有误,仅供参考。从目录来看,这本书值得期待!

不知是否有哪家出版社已经引进了这本书的版权。如果还没有,在这新年上班的第一天,请您预测一下:该书会不会译为中文出版?如果翻译过来,中文版销量会有多大?能不能比得过我在“这个男人不寻常”(http://blog.sciencenet.cn/blog-1557-750288.html)中介绍过的《信号与噪声》?

台湾翻译学学会会员林富松先生在一篇文章(http://www.taiwantati.org/?p=2038)中提到,《大数据时代》(浙江人民出版社,盛杨燕周涛 译;台湾天下出版也翻译了这本书,书名定为《大数据》)那本书与《预测型数据分析术》这本书是“本质上挺矛盾的两本书”。那我觉得,就更值得两书并读。

 

Predictive Analytics:The Power to Predict Who Will Click, Buy, Lie, or Die
Table of Contents

Foreword        Thomas H. Davenport

(著名管理学家达文波特为此书作序)

xiii

Preface(作者序)

What is the occupational hazard of predictive analytics?什么是预测型数据分析术的职业风险?

xv

Introduction(引言)
 The Prediction Effect
(预测效应)

How does predicting human behavior combat risk, fortify healthcare,  toughen crime fighting, and boost sales? Why must a computer learn in order  to predict? How can lousy predictions be extremely valuable? What makes data  exceptionally exciting? How is data science like porn? Why shouldn't  computers be called computers? Why do organizations predict when you will  die?

1

Chapter 1
 Liftoff! Prediction Takes Action (deployment)

开始!预测动手了(布局问题)

How much guts does it take to deploy a predictive model into field  operation, and what do you stand to gain? What happens when a man invests his  entire life savings into his own predictive stock market trading system?

17

Chapter 2
 With Power Comes Responsibility: Hewlett-Packard, Target, and the Police  Deduce Your Secrets (ethics)

有权力就有责任:休莱特-帕卡德、目标和警察推断出了你的秘密(伦理问题)

How do we safely harness a predictive machine that can foresee job  resignation, pregnancy, and crime? Are civil liberties at risk? Why does one  leading health insurance company predict policy holder death? An extended  sidebar on fraud detection addresses the question: how does machine  intelligence flip the meaning of fraud on its head?

37

Chapter 3
 The Data Effect: A Glut at the End of the Rainbow (data)

数据效应:彩虹尽头的过量(数据问题)

We are up to our ears in data, but how much can this raw material really  tell us? What actually makes it predictive? Does existing data go so far as  to reveal the collective mood of the human populace? If yes, how does our  emotional online chatter relate to the economy's ups and downs?

67


 

Chapter 4
 The Machine That Learns: A Look Inside Chase's Prediction of Mortgage Risk (modeling)

会学习的机器:Chase对按揭风险的预测之解剖(建模问题)

What form of risk has the perfect disguise? How does prediction transform  risk to opportunity? What should all businesses learn from insurance  companies? Why does machine learning require art in addition to science? What  kind of predictive model can be understood by everyone? How can we  confidently trust a machine's predictions? Why couldn't prediction prevent  the global financial crisis?

103

Chapter 5
 The Ensemble Effect: Netflix, Crowdsourcing, and Supercharging Prediction (ensembles)

集团效应:Netflix、众包和增压预测(集团问题)

To crowdsource predictive analytics—outsource it to the public at  large—a company launches its strategy, data, and research discoveries into  the public spotlight. How can this possibly help the company compete? What  key innovation in predictive analytics has crowdsourcing helped develop? Must  supercharging predictive precision involve overwhelming complexity, or is  there an elegant solution? Is there wisdom in nonhuman crowds?

133

Chapter 6
 Watson and the Jeopardy! Challenge (question answering)

机器人华生和《危机边缘》挑战(应答问题)

How does Watson—IBM's Jeopardy!-playing  computer—work? Why does it need predictive modeling in order to answer  questions, and what secret sauce empowers its high performance? How does the  iPhone's Siri compare? Why is human language such a challenge for computers?  Is artificial intelligence possible?

151

Chapter 7
 Persuasion by the Numbers: How Telenor, U.S. Bank, and the Obama Campaign  Engineered Influence (uplift)

用数字来劝诱:Telenor(博主:挪威的电信公司)、美国银行和奥巴马竞选阵营是如何创造与施加影响力的(造势问题)

What is the scientific key to persuasion? Why does some marketing  fiercely backfire? Why is human behavior the wrong thing to predict? What  should all businesses learn about persuasion from presidential campaigns?  What voter predictions helped Obama win in 2012 more than the detection of  swing voters? How could doctors kill fewer patients inadvertently? How is a  person like a quantum particle? Riddle: What often happens to you that cannot  be perceived, and that you can't even be sure has happened afterward—but  that can be predicted in  advance?

187

Afterword(跋)

Ten Predictions for the First Hour of 2020

关于2020年第一个小时将发生什么的十大预言

218

Appendices(附录)


A.Five Effects of Prediction

5种预测效应

221

B.Twenty-One Applications of Predictive Analytics

预测型数据分析术的21桩应用

222

C.Prediction People—Cast of "Characters"

预测人――各种“角色”的出演阵容

225

Notes

228

Acknowledgments

290

About the  Author

292

Index

293

另外,我还在网上发现一篇介绍此书的文章(http://www.oktranslation.com/news/twininfo35717.html),如下:

大数据: 预知未来的高科技"水晶球"

时间:2013-4-23 来源:网易 浏览次数:133

  itself at the center of a storm of outrage. The retailer's number crunchers had come up with a statisticalmethod for predicting which of its customers were most likely to become pregnant in the near future, givingTarget's marketers a head start on pitching them baby products.
  你可能还记得,塔吉特百货公司(Target)在去年初曾深陷愤怒的舆论漩涡中心。那是因为这家零售商的数据专家们开发出了一种统计方法,可以预测哪些客户有可能在近期怀孕,营销人员向她们推销婴幼儿产品时,就拥有了先人一步的优势。

  The model worked: Target expanded its customer base for pregnancy and infant-care products byabout 30%. But the media brouhaha, with everyone from The New York Times to Fox News accusing thecompany of "spying" on shoppers, took weeks to die down.
  这个模型很管用:在塔吉特购买孕期及婴幼儿产品的客户增长了30%。但这却引来舆论一片哗然,从《纽约时报》(The New York Times)到福克斯新闻(Fox News),几乎所有人都指责该公司是在“暗中监测”购物者。这场风波好几周后才平息下去。

  If Target's success at setting its sights on potential moms-to-be gives you the creeps, Eric Siegel's newbook could ruin your whole day. Siegel is a former Columbia professor whose company, Predictive Impact,builds mathematical models that cull valuable nuggets of data from floods of raw information. Companies usethe tools to forecast everything from what we'll shop for, to which movies we'll watch, to how likely we are tobe in a car accident or default on our credit cards.
  如果塔吉特成功监测准妈妈这件事已经让你觉得毛骨悚然了,那埃里克.西格尔的新书恐怕会让你惶惶不可终日的。西格尔曾是哥伦比亚大学(Columbia University)的教授,他的公司叫“预测影响”(Predictive Impact),专门开发各类数学模型,这些模型能从海量原始数据中提取出极具价值的信息。各类公司都在使用这些工具进行预测,不管是我们想购买什么东西,还是我们想看什么电影,不管是我们碰上车祸的可能性有多高,还是我们有多大可能会信用卡欠款,都能预测出来。

  In Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, Siegel explains how thesemodels work and where the pitfalls are, in clear, colorful terms. Simply put, predictive analytics, or PA, is thescience of learning from experience. Starting with data about the past and current behavior of a given groupof people -- whether customers, patients, prison inmates up for parole, voters, or employees -- analysts canpredict what they'll probably do next.
  在《预测分析:预测谁将点击、购买、撒谎或死亡的力量》(Predictive Analytics: The Power to PredictWho Will Click, Buy, Lie, or Die)一书中,西格尔用清晰生动的语言解释了这些模型运作的机制及各类误区。简而言之,预测分析,或简称PA,就是一种从经验中学习的科学。从既定人群——客户、病人、即将假释的囚犯、选民或员工——过去和当前的行为数据入手,分析师就能预知他们下一步可能的行为。

  This kind of high-tech crystal ball is behind "the growing trend to make decisions more 'data driven,'"Siegel writes."In fact, an organization that doesn't leverage its data in this way is like a person with aphotographic memory who never bothers to think."
  这是一种可以预知未来的高科技“水晶球”。西格尔写道,它位居“日益盛行的、越来越依靠数据做决策的趋势”幕后,“实际上,如果一个机构从来不用这种方式充分利用自己的数据,那就和一个人有过目不忘的本事却从来不动脑筋无异”。

  Predictive Analytics is packed with examples of how Citi, Facebook, Ford, IBM, Google, Netflix, PayPaland many other businesses and government agencies have put PA to work. Pfizer, for instance, has apredictive model to foretell the likelihood that a patient will respond to a given new drug within three weeks.LinkedIn uses PA to pinpoint the fellow members you might want as connections. At the IRS, a mathematicalranking system applied to past tax returns "empowered IRS analysts to find 25 times more tax evasion,without increasing the number of investigations."
  这本书列举了丰富的案例,有关花旗集团(Citi)、Facebook、IBM、谷歌公司(Google)、网飞公司(Netflix)、贝宝(PayPal)和其他企业及政府机构利用预测分析的例子比比皆是。比如,辉瑞制药(Pfizer)就有一个预测模型,它能预告病人在三周内对一种给定新药产生药效反应的几率。LinkedIn会用PA来准确找到你希望联系的用户。而在美国国税局(IRS),一套用于过去纳税申报单的数学排序系统“让IRS的分析师在不增加调查的前提下,能发现比以前多25倍的逃税情况。”

  And then there's Hewlett-Packard. A couple of years ago, alarmed by annual turnover rates in somedivisions as high as 20%, HP decided to try anticipating which of its 330,000 employees worldwide weremost likely to quit. Beginning with reams of data on things like salaries, raises, promotions, and job rotations,a team of analysts correlated that information with detailed employment records of people who had alreadyleft. Based on the similarities they found, the researchers assigned each current employee a Flight Riskscore.
  还有一个惠普公司(Hewlett-Packard)的案例。几年前,惠普的一些部门每年离职率高达20%,受此触动,惠普决定预测其全球33万名员工中谁最有可能辞职。分析师团队从海量数据入手,如薪酬水平、加薪情况、升迁情况及轮岗情况等,将它们和已离职员工的详细工作经历联系起来开展分析。在他们所发现的数据相似性基础上,研究者们为目前每位员工都打了一个离职风险(Flight Risk)评分。

(责任编辑:卢晓雪)

 




读书荐书
https://wap.sciencenet.cn/blog-1557-755026.html

上一篇:[转载]世界体系结构性危机下何去何从
下一篇:30多年前的粗糙翻译练习(5)--作家的未来
收藏 IP: 219.142.240.*| 热度|

23 许培扬 曹聪 杨华磊 魏瑞斌 王桂颖 化柏林 赵宇翔 周春雷 章成志 赵凤光 赵斌 王启云 钟炳 陈湘明 贡金涛 白图格吉扎布 鲍博 苏金燕 闫钟峰 黄淑芳 王芳 强涛 crossludo

该博文允许注册用户评论 请点击登录 评论 (3 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-11-10 07:14

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部