river-of-bytes.com
A River of Bytes: January 2015
http://www.river-of-bytes.com/2015_01_01_archive.html
A River of Bytes. Sunday, January 25, 2015. NSMC: A Native MongoDB Connector for Apache Spark. Both core Spark and Spark SQL provide ways to neatly plug in external database engines as a source of data. In this post I'm going to describe an experimental MongoDB. MongoDB makes an interesting case study as an external Spark data source for a number of reasons:. The data model, based on collections of JSON-like documents, is both deeply ad-hoc (. NSMC is hosted on GitHub. In Spark SQL Integration for MongoDB.
river-of-bytes.com
A River of Bytes: Efficient Spark SQL Queries to MongoDB
http://www.river-of-bytes.com/2015/05/efficient-spark-sql-queries-to-mongodb.html
A River of Bytes. Saturday, May 16, 2015. Efficient Spark SQL Queries to MongoDB. In previous posts I've discussed a native Apache Spark connector for MongoDB. NSMC) and NSMC's integration with Spark SQL. As discussed in earlier posts, the major challenges in implementing a Spark SQL external data source for MongoDB are:. Efficient schema inference for the entire collection. The NSMC project is hosted on GitHub. And the class nsmc.sql.MongoRelationProvider. To which each document in the collection confor...
river-of-bytes.com
A River of Bytes: August 2016
http://www.river-of-bytes.com/2016_08_01_archive.html
A River of Bytes. Sunday, August 28, 2016. Taking a Detour with Apache Spark. Almost two years ago, while preparing for a talk I was giving at the now defunct Seattle Eastside Scala Meetup, I started a public GitHub project. But lately I've had to ask myself some hard questions about the project. As I hope to post separately about soon, the evolution of Spark SQL's object model (remember SchemaRDD? Time to explore the Java APIs. Has it at 76% Scala, 58% Java 8 and 34% Java 7 or lower. Clearly, there ...
river-of-bytes.com
A River of Bytes: NSMC: A Native MongoDB Connector for Apache Spark
http://www.river-of-bytes.com/2015/01/nsmc-native-mongodb-connector-for.html
A River of Bytes. Sunday, January 25, 2015. NSMC: A Native MongoDB Connector for Apache Spark. Both core Spark and Spark SQL provide ways to neatly plug in external database engines as a source of data. In this post I'm going to describe an experimental MongoDB. MongoDB makes an interesting case study as an external Spark data source for a number of reasons:. The data model, based on collections of JSON-like documents, is both deeply ad-hoc (. NSMC is hosted on GitHub. In Spark SQL Integration for MongoDB.
river-of-bytes.com
A River of Bytes: December 2014
http://www.river-of-bytes.com/2014_12_01_archive.html
A River of Bytes. Sunday, December 28, 2014. Filtering and Projection in Spark SQL External Data Sources. In the previous post. As with other posts in this series, I'm assuming familiarity with Scala, and the code for the examples can be found at https:/ github.com/spirom/LearningSpark. In this case sql/RelationProviderFilterPushdown.scala. As we discussed last time, the external data source API is located in the org.apache.spark.sql. Which also supports filter pushdown. External Data Source Adapter:.
river-of-bytes.com
A River of Bytes: Developing Utility Bolts for Apache Storm
http://www.river-of-bytes.com/2016/10/developing-utility-bolts-for-apache.html
A River of Bytes. Sunday, October 23, 2016. Developing Utility Bolts for Apache Storm. Storm provides various pre-defined components, most of them spouts, providing standard data sources for streaming data from database systems, file systems, queueing systems and network listeners such as a web server, and so on. Similarly it provides pre-defined bolts, some serving as data sinks along the same lines, as well as interfaces to the usual logging frameworks. Project on GitHub, which includes a small number ...
river-of-bytes.com
A River of Bytes: Taking a Detour with Apache Spark
http://www.river-of-bytes.com/2016/08/taking-detour-with-apache-spark.html
A River of Bytes. Sunday, August 28, 2016. Taking a Detour with Apache Spark. Almost two years ago, while preparing for a talk I was giving at the now defunct Seattle Eastside Scala Meetup, I started a public GitHub project. But lately I've had to ask myself some hard questions about the project. As I hope to post separately about soon, the evolution of Spark SQL's object model (remember SchemaRDD? Time to explore the Java APIs. Has it at 76% Scala, 58% Java 8 and 34% Java 7 or lower. Clearly, there ...
river-of-bytes.com
A River of Bytes: Filtering and Projection in Spark SQL External Data Sources
http://www.river-of-bytes.com/2014/12/filtering-and-projection-in-spark-sql.html
A River of Bytes. Sunday, December 28, 2014. Filtering and Projection in Spark SQL External Data Sources. In the previous post. As with other posts in this series, I'm assuming familiarity with Scala, and the code for the examples can be found at https:/ github.com/spirom/LearningSpark. In this case sql/RelationProviderFilterPushdown.scala. As we discussed last time, the external data source API is located in the org.apache.spark.sql. Which also supports filter pushdown. External Data Source Adapter:.
SOCIAL ENGAGEMENT