# 关于大数据的另一本重要著作：《预测型数据分析术》

Wiley出版社2013年2月推出了Eric Siegel的著作《预测型数据分析术：关于谁会点击、谁会购买、谁会撒谎、谁会死亡的强大预测力》。Eric Siegel曾为美国哥伦比亚大学自然语言处理专业的助理教授，后来创办了“预测型数据分析术世界”，全力研究和推广数据分析术。本文后面是该书目录，我将各章标题译为中文了。由于没有读到原书，对各章标题的理解与翻译也许有误，仅供参考。从目录来看，这本书值得期待！

Predictive Analytics：The Power to Predict Who Will Click, Buy, Lie, or Die

itself at the center of a storm of outrage. The retailer's number crunchers had come up with a statisticalmethod for predicting which of its customers were most likely to become pregnant in the near future, givingTarget's marketers a head start on pitching them baby products.
你可能还记得，塔吉特百货公司（Target）在去年初曾深陷愤怒的舆论漩涡中心。那是因为这家零售商的数据专家们开发出了一种统计方法，可以预测哪些客户有可能在近期怀孕，营销人员向她们推销婴幼儿产品时，就拥有了先人一步的优势。

The model worked: Target expanded its customer base for pregnancy and infant-care products byabout 30%. But the media brouhaha, with everyone from The New York Times to Fox News accusing thecompany of "spying" on shoppers, took weeks to die down.
这个模型很管用：在塔吉特购买孕期及婴幼儿产品的客户增长了30%。但这却引来舆论一片哗然，从《纽约时报》（The New York Times）到福克斯新闻（Fox News），几乎所有人都指责该公司是在“暗中监测”购物者。这场风波好几周后才平息下去。

If Target's success at setting its sights on potential moms-to-be gives you the creeps, Eric Siegel's newbook could ruin your whole day. Siegel is a former Columbia professor whose company, Predictive Impact,builds mathematical models that cull valuable nuggets of data from floods of raw information. Companies usethe tools to forecast everything from what we'll shop for, to which movies we'll watch, to how likely we are tobe in a car accident or default on our credit cards.
如果塔吉特成功监测准妈妈这件事已经让你觉得毛骨悚然了，那埃里克.西格尔的新书恐怕会让你惶惶不可终日的。西格尔曾是哥伦比亚大学（Columbia University）的教授，他的公司叫“预测影响”（Predictive Impact），专门开发各类数学模型，这些模型能从海量原始数据中提取出极具价值的信息。各类公司都在使用这些工具进行预测，不管是我们想购买什么东西，还是我们想看什么电影，不管是我们碰上车祸的可能性有多高，还是我们有多大可能会信用卡欠款，都能预测出来。

In Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die, Siegel explains how thesemodels work and where the pitfalls are, in clear, colorful terms. Simply put, predictive analytics, or PA, is thescience of learning from experience. Starting with data about the past and current behavior of a given groupof people -- whether customers, patients, prison inmates up for parole, voters, or employees -- analysts canpredict what they'll probably do next.
在《预测分析：预测谁将点击、购买、撒谎或死亡的力量》（Predictive Analytics: The Power to PredictWho Will Click, Buy, Lie, or Die）一书中，西格尔用清晰生动的语言解释了这些模型运作的机制及各类误区。简而言之，预测分析，或简称PA，就是一种从经验中学习的科学。从既定人群——客户、病人、即将假释的囚犯、选民或员工——过去和当前的行为数据入手，分析师就能预知他们下一步可能的行为。

This kind of high-tech crystal ball is behind "the growing trend to make decisions more 'data driven,'"Siegel writes."In fact, an organization that doesn't leverage its data in this way is like a person with aphotographic memory who never bothers to think."
这是一种可以预知未来的高科技“水晶球”。西格尔写道，它位居“日益盛行的、越来越依靠数据做决策的趋势”幕后，“实际上，如果一个机构从来不用这种方式充分利用自己的数据，那就和一个人有过目不忘的本事却从来不动脑筋无异”。

Predictive Analytics is packed with examples of how Citi, Facebook, Ford, IBM, Google, Netflix, PayPaland many other businesses and government agencies have put PA to work. Pfizer, for instance, has apredictive model to foretell the likelihood that a patient will respond to a given new drug within three weeks.LinkedIn uses PA to pinpoint the fellow members you might want as connections. At the IRS, a mathematicalranking system applied to past tax returns "empowered IRS analysts to find 25 times more tax evasion,without increasing the number of investigations."

And then there's Hewlett-Packard. A couple of years ago, alarmed by annual turnover rates in somedivisions as high as 20%, HP decided to try anticipating which of its 330,000 employees worldwide weremost likely to quit. Beginning with reams of data on things like salaries, raises, promotions, and job rotations,a team of analysts correlated that information with detailed employment records of people who had alreadyleft. Based on the similarities they found, the researchers assigned each current employee a Flight Riskscore.
还有一个惠普公司（Hewlett-Packard）的案例。几年前，惠普的一些部门每年离职率高达20%，受此触动，惠普决定预测其全球33万名员工中谁最有可能辞职。分析师团队从海量数据入手，如薪酬水平、加薪情况、升迁情况及轮岗情况等，将它们和已离职员工的详细工作经历联系起来开展分析。在他们所发现的数据相似性基础上，研究者们为目前每位员工都打了一个离职风险（Flight Risk）评分。

（责任编辑:卢晓雪）

https://wap.sciencenet.cn/blog-1557-755026.html

## 全部精选博文导读

GMT+8, 2024-5-29 13:01