In the previous post we discussed on how to connect jupyter notebook to pyspark. Further going forward, in this post I will discuss on how you can run python scripts, and analyze and build Machine Learning models on top of data stored in
Connect iPython/Jupyter Notebook to pyspak
/ September 15, 2016
Prerequisites Install jupyter Download and uncompress spark 1.6.2 binary. Dowload pyrolite-4.13.jar Set Environment Variables open ~/.bashrc and add the following entries: export PYSPARK_DRIVER_PYTHON=ipython export PYSPARK_DRIVER_PYTHON_OPTS='notebook' pyspark export PYSPARK_PYTHON=/home/supun/Supun/Softwares/anaconda3/bin/python export SPARK_HOME="/home/supun/Supun/Softwares/spark-1.6.2-bin-hadoop2.6" export PATH="/home/supun/Supun/Softwares/spark-1.6.2-bin-hadoop2.6/bin:$PATH" export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH export PYTHONPATH=$SPARK_HOME/python/lib:$PYTHONPATH export SPARK_CLASSPATH=/home/supun/Downloads/pyrolite-4.13.jar If you are
Wrote by Supun Setunga
Basic DataFrame Operations in python
/ September 15, 2016
Prerequisites: Install python Install ipython notebook Create a directory as a workspace for the notebook, and navigate to it. Start python jupyter by running: jupyter notebook Create a new python notebook. To use Pandas Dataframe this notebook scipt, we first need to import
Wrote by Supun Setunga
Setting up a Fully Distributed HBase Cluster
/ September 01, 2016
This post will discuss on how to setup a fully distributed hbase cluster. Here we will not run zookeeper as a separate server, but will be using the zookeeper which is embedded in hbase itself. And our setup will consist of 1 master
Wrote by Supun Setunga
Setting up a Fully Distributed Hadoop Cluster
/ August 09, 2016
Here i will discuss on how to setup a fully distributed hadoop cluster with 1-master and 2 salves. Here the three nodes are setup in three different machines. Updating Hostnames To start off the things, lets first give hostnames to the three nodes.
Wrote by Supun Setunga
Obtain a Heap/Thread Dump
/ July 03, 2016
HeapDump: jmap -dump:live,format=b,file=<filename>.hprof <PID> Thread Dump: jstack <PID> > <filename>
Wrote by Supun Setunga
Check Database size in MySQL
/ June 21, 2016
Login to mysql with your usernamse and password. eg: mysql u root -proot Then execute the following command: SELECT table_schema "DB Name", ROUND(SUM(data_length + index_length)/1024/1024, 2) "Size in MBs" FROM information_schema.tables GROUP BY table_schema; Here SUM(data_length + index_length) is in bytes. Hence we have
Wrote by Supun Setunga
Stacking in Machine Learning
/ June 10, 2016






table.data-table th, table.data-table td { border: 1px solid black; padding: 10px; text-align:center; width:700px; } What is stacking? Stacking is one of the three widely used ensemble methods in Machine Learning and its applications. The overall idea of stacking is to train several models,
Wrote by Supun Setunga
Custom Transformers for Spark Dataframes
/ May 24, 2016
In Spark a transformer is used to convert a Dataframe in to another. But due to the immutability of Dataframes (i.e: existing values of a Dataframe cannot be changed), if we need to transform values in a column, we have to create a new
Wrote by Supun Setunga
Profiling with Java Flight Recorder
/ May 22, 2016
Java Profiling can help you to identify asses the performance of your program, improve your code and identify any defects such as memory leaks, high CPU usages, etc. Here I will discuss on how to profile your code using the java inbuilt utility
Wrote by Supun Setunga
Analytics for WSO2 ESB : Architecture in a Nutshell
/ May 22, 2016





ESB Analytics Server is the analytics distribution for the WSO2 ESB, which is built on top of WSO2 Data Analytics Server (DAS). Analytics for ESB consists of an inbuilt dashboard for Statistics and Tracing visualization for Proxy Services, APIs, Endpoints, Sequence and Mediators.
Wrote by Supun Setunga
Connect to MySQL Database Remotely
/ April 06, 2016
By default, access to mysql databases is bounded to the server which is running mysql itself. Hence, if we need to log-in to the mysql console or need to use a database from a remote server, we need to enable those configs. Open
Wrote by Supun Setunga
Creating a Log Dashboard with WSO2 DAS - Part I
/ March 06, 2016








WSO2 Data Analytics Server (DAS) can be used to do various kinds of batch data analytics and create dashboards out of those data. In this blog, I will be discussing on how can you create a simple dashboard using the data read from
Wrote by Supun Setunga
Adding WSSE Header to a SOAP Request
/ February 25, 2016
Sample SOAP Message with WSSE Header: <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:echo="http://echo.services.core.carbon.wso2.org"> <soapenv:Header> <wsse:Security soapenv:mustUnderstand="1" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"> <wsu:Timestamp wsu:Id="Timestamp-13" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd"> <wsu:Created>2015-05-21T04:17:56.541Z</wsu:Created> <wsu:Expires>2015-09-21T04:22:56.541Z</wsu:Expires> </wsu:Timestamp>
Wrote by Supun Setunga
Seasonal TimeSeries Modeling with Gradient Boosted Tree Regression
/ January 27, 2016







Seasonal Time Series data can be easily modeled with methods such as Seasonal-ARIMA, GARCH and HoltWinters. These are readily available in Statistical packages like R, STATA and etc. But If you wanted to model a Seasonal Time-Series using Java, there' are only very
Wrote by Supun Setunga
About Me
Read | Learn | Share
Powered by Blogger.
Popular Posts
Like us on Facebook
Search This Blog
Labels
- admin-service (1)
- annotation (1)
- authorization (1)
- ballerina (3)
- ballerinalang (3)
- bearer (1)
- cluster (2)
- dataframes (2)
- esb (1)
- find (1)
- hadoop (1)
- hbase (1)
- hdfs (1)
- heap-dump (1)
- IBM (1)
- ibm-mq (1)
- java (2)
- java-mission-control (1)
- jcmd (1)
- jfr (1)
- jmap (1)
- jstack (1)
- linux (3)
- logs (1)
- machine-learning (4)
- ml (3)
- mllib (1)
- mutualSSL (1)
- mysql (2)
- oath2 (1)
- pandas (1)
- performance (1)
- profiling (1)
- pyspark (2)
- python (3)
- R (1)
- randomForest (1)
- regression (1)
- security (2)
- siddhi (1)
- soap (1)
- spark (5)
- ssl (1)
- stacking (1)
- thread-dump (1)
- timeseries (1)
- tomcat (1)
- vfs (1)
- WebSphere (1)
- wso2 (15)
- wso2-ballerina (3)
- wso2apim (1)
- wso2das (3)
- wso2esb (6)
- wso2is (1)
- wso2ml (1)
- wsse (1)
- xpath (1)
Pages
FOLLOW US @ INSTAGRAM
Featured
JSON Manipulation with Ballerina
One of the standout features of ballerina from most of the other programming language is the first class support for JSON and XML as buil...
Looped Slider
sdfsdfdsfdsfsdsdg