pyspark streaming 实时数据保存到hbase 报错 求高手指导

哇塞你好帅 发布于 2018/06/08 10:27
阅读 454
收藏 0

from pyspark import SparkConf, SparkContext

spark = SparkContext(master = "local[2]",appName="StreamingWordCount")

aconf = {"hbase.zookeeper.quorum":"192.168.159.148","hbase.mapreduce.inputtable":"student",
"mapreduce.outputformat.class": "org.apache.hadoop.hbase.mapreduce.TableOutputFormat",
"mapreduce.job.output.key.class": "org.apache.hadoop.hbase.io.ImmutableBytesWritable",
"mapreduce.job.output.value.class": "org.apache.hadoop.io.Writable"}

keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"

valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"
rawData = ['3,info,name,Rongcheng','4,info,name,Guanhua']
# ( rowkey , [ row key , column family , column name , value ] )
spark.parallelize(rawData).map(lambda x: (x[0],x.split(','))).saveAsNewAPIHadoopDataset(keyConverter=keyConv,valueConverter=valueConv,conf=aconf)

这个代码是一个单个rdd数据插入的  只需要把实时的rdd接入就可以了  现在我这个也是报空指针异常,我去取数据的时候有类似的方法 能取到数据 但是这个就报空,不启动hbase也是报空  求指教 是我的参数的问题么?

return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset.
: java.lang.NullPointerException
    at org.apache.hadoop.hbase.security.UserProvider.instantiate(UserProvider.java:122)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:214)
    at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:119)
    at org.apache.hadoop.hbase.mapreduce.TableOutputFormat.checkOutputSpecs(TableOutputFormat.java:177)
    at org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.write(SparkHadoopMapReduceWriter.scala:76)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1085)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1085)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)

加载中
0
哇塞你好帅

我的版本是spark2.2版本的 habse 1.4的

0
elisonwu
elisonwu

请问问题解决了吗

返回顶部
顶部