Big Data, Crystal Balls and Looking Glasses: Reviewing 2016, predicting 2017
大数据,水晶球和镜子:回顾2016,预测2017
End-of-year reviews are boring -- and everyone does them. Predictions are boring -- and they are hard. Of course, this is different -- because big data.
年底回顾很无聊—可每个人都要做回顾。预测未来很无聊--并且它们很难预测。当然,这是不同的--因为大数据。
How do big data people go about making end-of-year reviews and predictions? Using data is the obvious answer, but there's a few issues with that approach: there is no synthesis in data alone -- you have to find the story behind data, pick an angle and seek meaning. In addition, that approach does not account for subtle hints, industry knowledge, and big ideas.
搞大数据的人们是如何来年底回顾和来年预测的呢?使用数据是显而易见的答案,但是这个方法有一些问题:数据里面没有综合的结论--你需要找到数据背后的故事,选取一个角度并且寻找它的意义。另外,那个方法不包含精确的提示信息,行业知识和大方向。
To paraphrase Carl Sagan, "we wish to find the truth, no matter where it lies. But to find the truth we need imagination and data both. We will not be afraid to speculate, but we will be careful to distinguish speculation from fact." In this spirit, let's keep things equally opinionated and objective in 2017.
卡尔萨根的意思是,“我们希望找到真相,无论它在哪里。但是为了找到真相,我们需要想象力和数据。我们不害怕推测,但是我们会很仔细从事实中获取推测结果。” 在这种精神下,让我们在2017同等主观又客观地看事情吧。
It's the end of Hadoop as we know it, and I feel fine
正如我们所知道的那样,Hadoop要到头了,我觉得还好。
Hadoop turned 10 in 2016. It's come a long way from a pet project named after a toy elephant to the (metaphorical) stampeding beast now in most every CXO's name-dropping list. The latest Big Data maturity survey showed that 73 percent of respondents are now in production with Hadoop (vs. 65 percent last year). And yet we're here to tell you Hadoop as we know it is dead. And that's not even news.
Hadoop在2016年表现的十全十美。它从一个以玩具大象命名的实验项目成长到现在几乎出现在每个首席官的炫耀名单里的狂奔的怪兽花了很长的时间。最新的大数据成熟度调查显示百分之七十三的受访者现在产品中都在使用Hadoop(相对去年是百分之六十五)。然后据我们所知Hadoop已死,而这几乎不是新闻。
Hadoop has been constantly evolving, expanding, and re-inventing itself throughout its lifetime. A massive ecosystem has been developing around the initial bare-bones offering, and today Hadoop is more of a platform than "just" a storage and compute framework. The introduction of YARN was a game changer, enabling Hadoop to become a Big Data OS and to break away from its batch-oriented MapReduce origins.
Hadoop在它的生命过程中一直在持续的演进,扩张,和重新发明自己。围绕着最初的基础功能,Hadoop发展出了一个庞大的生态系统,并且今天它更像一个平台,而不仅仅是一个储存和计算的框架。YARN的引入颠覆了Hadoop,使得Hadoop成为了一个大数据操作系统,脱离了原来的面向批量操作的MapReduce。
In 2016, data and stories from the trenches all pointed to the same direction: batch, MapReduce Hadoop is dead, long live real-time, Spark Hadoop. 25 percent of organizations are using Spark in production today with an additional 33 percent using it in development, and all major Hadoop vendors are involved in it. Adding up suggests that by the end of 2017 up to 50 percent of organizations could be using Spark in production.
在2016年,现实中的数据和事例都指向了同一个方向:批处理,MapReduce Hadoop已死,实时处理万岁,Spark Hadoop。现在百分之二十五的组织中线上产品中都在用Spark,另外有33%正在使用Spark做开发,并且所有主流的Hadoop服务商都参与到Spark中了。到2017年底,加起来会有多达50%的公司在它们的线上产品中使用Spark。
But it's not necessarily a Spark or bust future: neither is Spark the only streaming game in town, nor is Hadoop the only Big Data platform. Alternatives do exist, and users may migrate or leapfrog to them skipping Spark or Hadoop altogether, the same way they are now migrating from or skipping MapReduce.
Spark未来会兴盛还是萧条都不一定:Spark既不是唯一最好的大数据平台,Hadoop也不是仅有的大数据平台。可选方案确实存在,用户可以迁移到或者跳过Spark和Hadoop到它们上面去,就像现在人们正从MapReduce迁移出去或者跳过MapReduce一样。
[图片上传中。。。(1)]
The Big Data landscape is host to a multitude of different approaches. But more and more it looks like everyone is adding everyone else's features. Convergence or me-too? Image: Martin Kleppmann.
大数据框架是基于许多不同方法的。但是看起来每个模块都在加入越来越多其余模块的功能。聚合还是复制?图片:Martin Kleppmann
**
Becoming all things to all men to save some
成为满足所有人的万能者来保留用户
Spark can do both streaming and batch processing. And it can also do SQL, and graphs. And of course on Hadoop you can also do SQL and/or NoSQL in a number of other ways, utilizing a wide choice of tools. That's what being an ecosystem is all about, right? But then again, everyone seems to be at it these days.
Spark既能做流处理也能做批量处理。它也能处理SQL和图片。当然在Hadoop上你也能通过使用许多可选的工具来处理SQL和/或NoSQL。这是作为一个生态系统所应该做的,是吗?但是再说一次,每个大数据系统现在看起来都是这样子的。
NoSQL databases like Cassandra / DataStax Enterprise can now also do graph, in addition to key-value, tabular and document. What about the iconic NoSQL document store - MongoDB? Well, besides document, you can now also do SQL . Microsoft's SQL Server? Youraverage SQL server no more: it can run on Linux, it supports R, in-memory processing and column store. MariaDB, the poor man's SQL server, also has its column store now.
像Cassandra / DataStax Enterprise 这样子的NoSQL数据库在能处理键值,格式化和文档之外现在也能处理图片。那著名的NoSQL文档库MongoDB怎么样呢?好吧,除了文档,你也能使用SQL了。微软的SQL Server呢?它不再是你认识那个平庸的SQL服务器了:它能再Linux上运行,它支持R语言,内存运行和列存储。MariaDB,穷人的SQL服务器,它现在也支持列存储了。
Neo4J, the iconic graph store? It's going ACID. Google's BigQuery now supports standard SQL , joining Amazon Redshift that has had it for a while as it's based on Postgres. Of course, analytics-oriented column stores have long supported SQL. And traditional relational DBs like Oracle and IBM have been adding features like in-memory processing and column store for a while as well. Key-stores do it, document-stores do it, graph-stores do it, even SQL incumbents do it.
Neo4J, 典型的图形数据库?它也要支持ACID了。谷歌的BigQuery现在支持标准SQL,Amazon Redshift使用了BigQuery一段时间了因为它基于Postgres。当然,面向统计的列存储数据库长久以来就支持SQL。传统的关系型数据库像Oracle和IBM也一直在增加像内存处理和列存储这样子的功能。键值存储数据库这样子,文档存储数据库这样子,图形存储数据库这样子,甚至就连SQL数据库也是如此。
The boundaries are blurring, as more and more data platforms try to be more things to more people. Doing most everything on the same platform is good for vendors that want to increase their retention and good for users who don't want to have to mix and match disparate platforms to get things done. But it's not a sheer land-ho of opportunity - threats lie ahead too. Most notably, vendor lock-in, half-baked features, and half-hearted users.
因为越来越多的平台都在为更多的人群提供更多的功能,平台之间的界限正越来越模糊。对于想增加客户保留率的供应商和不想混用和拼接不相干的平台来达到目的的用户来说,在相同的一个平台上把几乎所有事情都做了是极好的。但是它并不是一个纯粹的充满机会的土地,危险也同样存在. 最显著的问题有,供应商锁定,半吊子功能和意兴阑珊的用户。
[图片上传中。。。(2)]
Some are trying to get the basics right, while some are after up in the sky goals. Yet, there's a place for everyone under Big Data. Image: Martin Kleppmann
一些人在为了基本的权利而努力,同时一些人在追求远大的目标。然而,大数据下每个人都有自己的容身之地。 图片:Martin Kleppmann
This article is from http://www.zdnet.com/article/big-data-crystal-balls-and-looking-glasses-reviewing-2016-predicting-2017/