`
文章列表
1.flow 1.1 shuffle abstract       1.2 shuffle flow     1.3 sort flow  in shuffle     1.4 data structure in mem   2.core code paths   //SortShuffleWriter override def write(records: Iterator[Product2[K, V]]): Unit = { //-how to collect this result by partition?by index file //-1 sort ...
  in this section,we will verify that  how does spark collect data from prevous stage to next stage(result task)      figure after finishing ShuffleMapTask computation(ie post process ).note:the last method 'reviveOffers()' is redundant in this mode as the step 13 will setup next stage(reuslttask ...
  now we will dive into spark internal as per this simple example(wordcount,later articles will reference this one by default) below sparkConf.setMaster("local[2]") //-local[*] by default //leib-confs:output all the dependencies logs sparkConf.set("spark.logLineage","tru ...
  similar to other open source projects,spark has several shells are listed there sbin server side shells       start-all.sh start the whole spark daemons (ie. start-master.sh,start-slaves.sh)   start-master.sh  startup the spark's master process  deliver to "spark-daemon.sh ...
如果你想比较一下看看两个对象是否相等,可以使用或者==,或它的反义 !=。(对所有对象都适用,而不仅仅是基本数据类型)
with enabling both system environment 'SPARK_PRINT_LAUNCH_COMMAND' and --verbose ,the spark command is more detailed that outputed from spark-submit.sh:   hadoop@GZsw04:~/spark/spark-1.4.1-bin-hadoop2.4$ spark-submit --master yarn --verbose --class org.apache.spark.examples.JavaWordCount lib/spark ...
ref :scala object 转Class Scala强制类型转换

[spark-src] 1-overview

what is   "Apache Spark™ is a fast and general engine for large-scale data processing....Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk." stated in apache spark      in despite of it's real a fact or not, i think certain key concepts/components to ...
  base on :   spark-1.4.1   hadoop-2.5.2     Base from simpleness to complexity and working flow principle,we conform to these steps: 1.[spark-src] spark overview 2.[spark-src] core    from basic demos to dive into spark internal.this section will envolve many components,so it's much detail ...
google AlphaGo vs Lee on 'the game of go' VS 回广州了,再战江湖 cheers
env: hbase,94.26 zookeeper,3.4.3 --------------- 1.downed node   this morning we found a regionserver(host-34) downed in our monitor.so we dived into the logs of hbase and found that in this host:   2016-02-29 00:50:36,799 INFO [regionserver60020-SendThread(host-04:2181)] ClientCnxn.java:108 ...
  spark stream lineage ref:   Spark Streaming:大规模流式数据处理的新贵  
hbase qq学习交流群476390228 ,专注hbase技术交流,但也不排斥nosql相关数据库探讨,所谓举一反三   cheers
    when i do some sql-related certain simple test programs,this exception occurs to me.although it seems weird.  (used spark-1.3.1 for project needness) scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror at scala.reflect.internal. ...
  i wanna export a table to json format files,but after gging,nothing solutions found.i known,pig is used to do soome sql like mapreduces stuff; and hive is a dataware to build on hbase.but i cant some soutions /wordaround to do that too( maybe i miss something)   so i consider to use mr to figure ...
Global site tag (gtag.js) - Google Analytics