How to group data and plot line graphs

This is the first time I am using pandas and iPython notebook and was not able to figure out the correct search terms for my problem.

I have a .xls file for compile time data for 3 build-servers located at 3 sites A, B and C. These build servers compile multiple projects, so i will pick any specific project. Hence I need to plot data like this (for a specific project - not all in one graph, to keep it simple):

X-axis = date
Y-axis = average build time on that date

3 lines for sites A, B and C

What I have done so far :

import pandas as pd
import numpy as np
import matplotlib as plt 

file=  r'/home/abc/Downloads/request.xls'
df = pd.read_excel(file,parse_dates=['Date'])

build_times = df[['Date','site','project','Duration']]
build_group = build_times.groupby(['Date','site','project']).mean()

I need help on following :

  1. how i select only successful builds if there is a column status with 0 and 1.

  2. How to plot the lines for sites A,B and C (for specific project) with above mentioned X and Y axes.

Topic ipython visualization pandas

Category Data Science


About your first question, you are looking for Boolean indexing in Python Pandas.

build_times = df[['Date','site','project','Duration','Status']]
build_times.loc[build_times['Status'] == 1]

About the second question, you can filter again the dataframe and draw multiple lines on the same graph like this:

import seaborn as sns
import matplotlib.pyplot as plt

sns.kdeplot(build_group.vals[(build_group.site == "A") && (build_group.project == "example")],label='Project example on site A');
sns.kdeplot(build_group.vals[(build_group.site == "B") && (build_group.project == "example")],label='Project example on site B');

plt.xlabel('Date')
plt.ylabel('Average build time')
plt.show()

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.