Basic DataFrame Operations in python
Prerequisites:
- Install python
- Install ipython notebook
Create a directory as a workspace for the notebook, and navigate to it. Start python jupyter by running:
jupyter notebook
Create a new python notebook. To use Pandas Dataframe this notebook scipt, we first need to import the pandas library as follows.
import numpy as np import pandas as pd
Importing a Dataset
To import a csv file from local file system:
filePath = "/home/supun/Supun/MachineLearning/data/Iris/train.csv" irisData = pd.read_csv(filePath) print(irisData)
Output will be as follows:
sepal_length sepal_width petal_length petal_width 0 NaN 3.5 1.4 0.2 1 NaN 3.0 1.4 0.2 2 NaN 3.2 1.3 0.2 3 NaN 3.1 1.5 0.2 4 NaN 3.6 1.4 0.2 5 NaN 3.9 1.7 0.4 6 NaN 3.4 1.4 0.3 7 NaN 3.4 1.5 0.2 8 NaN 2.9 1.4 0.2 9 NaN 3.1 1.5 0.1 10 NaN 3.7 1.5 0.2 11 NaN 3.4 1.6 0.2 12 NaN 3.0 1.4 0.1
Basic Retrieve Operations
Get a single column of the dataset. Say we want to get all the values of the column "sepal_length":
print(irisData["sepal_length"])
Get a multiple column of the dataset. Say we want to get all the values of the column "sepal_length" and "petal_length":
print(irisData[["sepal_length", "petal_length"]]) #Note there are two square brackets.Get a subset of rows of the dataset. Say we want to get the first 10 rows of the dataset:
print(irisData[0:10])
Get a subset of rows a column of the dataset. Say we want to get the first 10 rows of the column "sepal_length" of the dataset:
print(irisData["sepal_length"][0:10])
Basic Math Operations
Add a constant to each value of a column in the dataset:print(irisData["sepal_length"] + 5)
Add two (or more) columns in the dataset:
print(irisData["petal_width"] + irisData["petal_length"])Here values will be added row-wise. i.e: value in the n-th row of petal_width column, is added to the value in the n-th row of petal_length column.
Similarly we can do the same for other math operations such as Substraction (-), Multiplication (*) and Division (/) as well.
0 comments