In the previous post we discussed on how to connect jupyter notebook to pyspark. Further going forward, in this post I will discuss on how you can run python scripts, and analyze and build Machine Learning models on top of data stored in
Connect iPython/Jupyter Notebook to pyspak
/ September 15, 2016
Prerequisites Install jupyter Download and uncompress spark 1.6.2 binary. Dowload pyrolite-4.13.jar Set Environment Variables open ~/.bashrc and add the following entries: export PYSPARK_DRIVER_PYTHON=ipython export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark export PYSPARK_PYTHON=/home/supun/Supun/Softwares/anaconda3/bin/python export SPARK_HOME="/home/supun/Supun/Softwares/spark-1.6.2-bin-hadoop2.6" export PATH="/home/supun/Supun/Softwares/spark-1.6.2-bin-hadoop2.6/bin:$PATH" export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python/lib:$PYTHONPATH export SPARK_CLASSPATH=/home/supun/Downloads/pyrolite-4.13.jar If you are
Wrote by Supun Setunga
Basic DataFrame Operations in python
/ September 15, 2016
Prerequisites: Install python Install ipython notebook Create a directory as a workspace for the notebook, and navigate to it. Start python jupyter by running: jupyter notebook Create a new python notebook. To use Pandas Dataframe this notebook scipt, we first need to import
Wrote by Supun Setunga
Setting up a Fully Distributed HBase Cluster
/ September 01, 2016
This post will discuss on how to setup a fully distributed hbase cluster. Here we will not run zookeeper as a separate server, but will be using the zookeeper which is embedded in hbase itself. And our setup will consist of 1 master
Wrote by Supun Setunga
About Me
Read | Learn | Share
Powered by Blogger.
Popular Posts
Like us on Facebook
Search This Blog
Labels
- admin-service (1)
- annotation (1)
- authorization (1)
- ballerina (3)
- ballerinalang (3)
- bearer (1)
- cluster (2)
- dataframes (2)
- esb (1)
- find (1)
- hadoop (1)
- hbase (1)
- hdfs (1)
- heap-dump (1)
- IBM (1)
- ibm-mq (1)
- java (2)
- java-mission-control (1)
- jcmd (1)
- jfr (1)
- jmap (1)
- jstack (1)
- linux (3)
- logs (1)
- machine-learning (4)
- ml (3)
- mllib (1)
- mutualSSL (1)
- mysql (2)
- oath2 (1)
- pandas (1)
- performance (1)
- profiling (1)
- pyspark (2)
- python (3)
- R (1)
- randomForest (1)
- regression (1)
- security (2)
- siddhi (1)
- soap (1)
- spark (5)
- ssl (1)
- stacking (1)
- thread-dump (1)
- timeseries (1)
- tomcat (1)
- vfs (1)
- WebSphere (1)
- wso2 (15)
- wso2-ballerina (3)
- wso2apim (1)
- wso2das (3)
- wso2esb (6)
- wso2is (1)
- wso2ml (1)
- wsse (1)
- xpath (1)
Pages
FOLLOW US @ INSTAGRAM
Featured
JSON Manipulation with Ballerina
One of the standout features of ballerina from most of the other programming language is the first class support for JSON and XML as buil...
Looped Slider
sdfsdfdsfdsfsdsdg