Example notebook¶

[1]:

import perfume
import perfume.analyze
import pandas as pd
import bokeh.io
bokeh.io.output_notebook()

Loading BokehJS ...

Setup¶

To start, set up some functions to benchmark.

[2]:

import time
import numpy as np

def test_fn_1():
    good = np.random.poisson(20)
    bad = np.random.poisson(100)
    msec = np.random.choice([good, bad], p=[.99, .01])
    time.sleep(msec / 3000.)

def test_fn_1_no_outliers():
    time.sleep(np.random.poisson(20) / 3000.)

def test_fn_2():
    good = np.random.poisson(5)
    bad = np.random.poisson(150)
    msec = np.random.choice([good, bad], p=[.95, .05])
    time.sleep(msec / 3000.)

def test_fn_3():
    msec = max(1, np.random.normal(100, 10))
    time.sleep(msec / 3000.)

numbers = np.arange(0, 1, 1. / (3 * 5000000))

def test_fn_4():
    return np.sum(numbers)

# Create a variable named "samples", in this cell.  This way,
# if we change these functions, we'll reset the samples so we
# don't use old data with changed implementations.
samples = None

Benchmark¶

Run the benchmark for a while by executing this cell. Since we capture the output data in samples, and pass it back in as an argument, you can interrupt the cell, take a look at the output so far, and then execute this cell again to resume the benchmark.

[3]:

samples = perfume.bench(test_fn_1, test_fn_2, test_fn_3, test_fn_4,
                        samples=samples)

Descriptive Timing Statistics
function	test_fn_1	test_fn_2	test_fn_3	test_fn_4
count	158	158	158	158
mean	7.1	3.42	33.5	13.5
std	2.38	8.59	3.04	2.19
min	3.28	0.548	25.7	8.19
25%	5.91	1.34	31.4	12.9
50%	6.9	1.84	33.6	14.2
75%	7.97	2.35	35.7	14.9
max	30	56.7	41.3	16.9

K-S test
	test_fn_2	test_fn_3	test_fn_4
K-S test Z
test_fn_1	8.49	8.83	7.65
test_fn_2	nan	8.61	8.55
test_fn_3	nan	nan	8.89

Bucketed K-S test
	test_fn_2	test_fn_3	test_fn_4
K-S test Z
test_fn_1	16	22	22
test_fn_2	nan	22	22
test_fn_3	nan	nan	22

Analyzing the samples¶

Let’s look at the format of the output, each function execution gets its begin and end time recorded:

[4]:

samples.head()

[4]:

function	test_fn_1		test_fn_2		test_fn_3		test_fn_4
timing	begin	end	begin	end	begin	end	begin	end
0	6.668990e+07	6.668991e+07	6.668991e+07	6.668991e+07	6.668991e+07	6.668994e+07	6.668994e+07	6.668995e+07
1	6.668995e+07	6.668995e+07	6.668995e+07	6.668995e+07	6.668995e+07	6.668998e+07	6.668998e+07	6.668999e+07
2	6.668999e+07	6.669000e+07	6.669000e+07	6.669000e+07	6.669000e+07	6.669004e+07	6.669004e+07	6.669004e+07
3	6.669004e+07	6.669005e+07	6.669005e+07	6.669006e+07	6.669006e+07	6.669009e+07	6.669009e+07	6.669010e+07
4	6.669010e+07	6.669011e+07	6.669011e+07	6.669011e+07	6.669011e+07	6.669014e+07	6.669014e+07	6.669015e+07

One thing we can do is plot each function’s distribution as it develops over simulated time:

[5]:

perfume.analyze.cumulative_quantiles_plot(samples)

We can run a K-S test and see whether our functions are significantly different:

[6]:

perfume.analyze.ks_test(perfume.analyze.timings(samples))

[6]:

	test_fn_2	test_fn_3	test_fn_4
K-S test Z
test_fn_1	8.494414	8.831940	7.650598
test_fn_2	NaN	8.606922	8.550668
test_fn_3	NaN	NaN	8.888194

We can convert them to elapsed timings instead of begin/end time points, get resampled timings to see outliers show a stronger presence, or isolate samples to be as if they ran by themselves

[7]:

timings = perfume.analyze.timings(samples)
bt = perfume.analyze.bucket_resample_timings(samples)
isolated = perfume.analyze.isolate(samples)
isolated.head()

[7]:

function	test_fn_1		test_fn_2		test_fn_3		test_fn_4
timing	begin	end	begin	end	begin	end	begin	end
0	0.000000	7.879083	0.000000	1.532343	0.000000	29.188473	0.000000	9.990919
1	7.879083	13.432623	1.532343	2.720641	29.188473	59.487223	9.990919	19.318506
2	13.432623	21.638016	2.720641	3.269045	59.487223	93.088024	19.318506	28.697242
3	21.638016	29.194047	3.269045	6.157080	93.088024	128.720489	28.697242	39.406731
4	29.194047	34.441334	6.157080	7.378798	128.720489	160.600259	39.406731	48.569794

With these, and other charting libraries, you can do whatever you want with the data:

[8]:

from bokeh import palettes
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

fig, ax = plt.subplots(figsize=(16, 9))
for col, color in zip(timings.columns, palettes.Set1[len(timings.columns)]):
    sns.distplot(timings[col], label=col, color=color, ax=ax,
#                  hist_kws=dict(cumulative=True),
#                  kde_kws=dict(cumulative=True)
                )
ax.set_xlabel('millis')
ax.legend()
timings.describe()

[8]:

function	test_fn_1	test_fn_2	test_fn_3	test_fn_4
count	158.000000	158.000000	158.000000	158.000000
mean	7.099615	3.423204	33.526883	13.455989
std	2.382655	8.588543	3.039051	2.193190
min	3.280499	0.548404	25.672770	8.190510
25%	5.908776	1.335742	31.432720	12.909524
50%	6.904165	1.840656	33.613588	14.226354
75%	7.966824	2.353760	35.680228	14.863801
max	29.985672	56.681542	41.308623	16.934982

[9]:

import matplotlib.pyplot as plt
timings['test_fn_1'].hist(cumulative=True, normed=1, alpha=0.3)
timings['test_fn_2'].hist(cumulative=True, normed=1, alpha=0.3)

[9]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f94c9fa8f28>

[10]:

import matplotlib.pyplot as plt
bt['test_fn_1'].hist(cumulative=True, normed=1, alpha=0.3)
bt['test_fn_2'].hist(cumulative=True, normed=1, alpha=0.3)

[10]:

<matplotlib.axes._subplots.AxesSubplot at 0x7f94c9d21a58>

[11]:

sns.pairplot(timings#, diag_kws={'cumulative': True}
            )

[11]:

<seaborn.axisgrid.PairGrid at 0x7f94c9ad46a0>

[12]:

import scipy.stats
bt = perfume.analyze.bucket_resample_timings(samples)
(scipy.stats.ks_2samp(timings['test_fn_1'], timings['test_fn_2']),
 scipy.stats.ks_2samp(bt['test_fn_1'], bt['test_fn_2']))

[12]:

(Ks_2sampResult(statistic=0.95569620253164556, pvalue=5.5872505324246181e-65),
 Ks_2sampResult(statistic=0.71199999999999997, pvalue=4.691029271698989e-223))