Basic DataFrame Operations in python

Prerequisites:


  • Install python
  • Install ipython notebook

Create a directory as a workspace for the notebook, and navigate to it. Start python jupyter by running:
jupyter notebook

Create a new python notebook. To use Pandas Dataframe this notebook scipt, we first need to import the pandas library as follows.
import numpy as np
import pandas as pd



Importing a Dataset


To import a csv file from local file system:
filePath = "/home/supun/Supun/MachineLearning/data/Iris/train.csv"
irisData = pd.read_csv(filePath)
print(irisData)

Output will be as follows:
     sepal_length  sepal_width  petal_length  petal_width
0             NaN          3.5           1.4          0.2
1             NaN          3.0           1.4          0.2
2             NaN          3.2           1.3          0.2
3             NaN          3.1           1.5          0.2
4             NaN          3.6           1.4          0.2
5             NaN          3.9           1.7          0.4
6             NaN          3.4           1.4          0.3
7             NaN          3.4           1.5          0.2
8             NaN          2.9           1.4          0.2
9             NaN          3.1           1.5          0.1
10            NaN          3.7           1.5          0.2
11            NaN          3.4           1.6          0.2
12            NaN          3.0           1.4          0.1


Basic Retrieve Operations


Get a single column of the dataset. Say we want to get all the values of the column "sepal_length":
print(irisData["sepal_length"])

Get a multiple column of the dataset. Say we want to get all the values of the column "sepal_length" and "petal_length":
print(irisData[["sepal_length", "petal_length"]])
#Note there are two square brackets.
Get a subset of rows of the dataset. Say we want to get the first 10 rows of the dataset:
print(irisData[0:10])

Get a subset of rows a column of the dataset. Say we want to get the first 10 rows of the column "sepal_length" of the dataset:
print(irisData["sepal_length"][0:10])


Basic Math Operations

Add a constant to each value of a column in the dataset:
print(irisData["sepal_length"] + 5)

Add two (or more) columns in the dataset:
print(irisData["petal_width"] + irisData["petal_length"])
Here values will be added row-wise. i.e: value in the n-th row of petal_width column, is added to the value in the n-th row of petal_length column.

Similarly we can do the same for other math operations such as Substraction (-), Multiplication (*) and Division (/) as well.

Share:

0 comments