In the previous post we discussed on how to connect jupyter notebook to pyspark. Further going forward, in this post I will discuss on how you can run python scripts, and analyze and build Machine Learning models on top of data stored in
Wrote by Supun Setunga
Prerequisites Install jupyter Download and uncompress spark 1.6.2 binary. Dowload pyrolite-4.13.jar Set Environment Variables open ~/.bashrc and add the following entries:  export PYSPARK_DRIVER_PYTHON=ipython export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark export PYSPARK_PYTHON=/home/supun/Supun/Softwares/anaconda3/bin/python export SPARK_HOME="/home/supun/Supun/Softwares/spark-1.6.2-bin-hadoop2.6" export PATH="/home/supun/Supun/Softwares/spark-1.6.2-bin-hadoop2.6/bin:$PATH" export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python/lib:$PYTHONPATH export SPARK_CLASSPATH=/home/supun/Downloads/pyrolite-4.13.jar If you are
Wrote by Supun Setunga
Prerequisites: Install python Install ipython notebook Create a directory as a workspace for the notebook, and navigate to it. Start python jupyter by running: jupyter notebook Create a new python notebook. To use Pandas Dataframe this notebook scipt, we first need to import
Wrote by Supun Setunga
This post will discuss on how to setup a fully distributed hbase cluster. Here we will not run zookeeper as a separate server, but will be using the zookeeper which is embedded in hbase itself. And our setup will consist of 1 master
Wrote by Supun Setunga
Page 1 of 6123456Next »Last