Example notebook

[1]:
import perfume
import perfume.analyze
import pandas as pd
import bokeh.io
bokeh.io.output_notebook()
Loading BokehJS ...

Setup

To start, set up some functions to benchmark.

[2]:
import time
import numpy as np

def test_fn_1():
    good = np.random.poisson(20)
    bad = np.random.poisson(100)
    msec = np.random.choice([good, bad], p=[.99, .01])
    time.sleep(msec / 3000.)

def test_fn_1_no_outliers():
    time.sleep(np.random.poisson(20) / 3000.)

def test_fn_2():
    good = np.random.poisson(5)
    bad = np.random.poisson(150)
    msec = np.random.choice([good, bad], p=[.95, .05])
    time.sleep(msec / 3000.)

def test_fn_3():
    msec = max(1, np.random.normal(100, 10))
    time.sleep(msec / 3000.)

numbers = np.arange(0, 1, 1. / (3 * 5000000))

def test_fn_4():
    return np.sum(numbers)

# Create a variable named "samples", in this cell.  This way,
# if we change these functions, we'll reset the samples so we
# don't use old data with changed implementations.
samples = None

Benchmark

Run the benchmark for a while by executing this cell. Since we capture the output data in samples, and pass it back in as an argument, you can interrupt the cell, take a look at the output so far, and then execute this cell again to resume the benchmark.

[3]:
samples = perfume.bench(test_fn_1, test_fn_2, test_fn_3, test_fn_4,
                        samples=samples)
Descriptive Timing Statistics
function test_fn_1 test_fn_2 test_fn_3 test_fn_4
count 158 158 158 158
mean 7.1 3.42 33.5 13.5
std 2.38 8.59 3.04 2.19
min 3.28 0.548 25.7 8.19
25% 5.91 1.34 31.4 12.9
50% 6.9 1.84 33.6 14.2
75% 7.97 2.35 35.7 14.9
max 30 56.7 41.3 16.9
K-S test
test_fn_2 test_fn_3 test_fn_4
K-S test Z
test_fn_1 8.49 8.83 7.65
test_fn_2 nan 8.61 8.55
test_fn_3 nan nan 8.89
Bucketed K-S test
test_fn_2 test_fn_3 test_fn_4
K-S test Z
test_fn_1 16 22 22
test_fn_2 nan 22 22
test_fn_3 nan nan 22

Analyzing the samples

Let’s look at the format of the output, each function execution gets its begin and end time recorded:

[4]:
samples.head()
[4]:
function test_fn_1 test_fn_2 test_fn_3 test_fn_4
timing begin end begin end begin end begin end
0 6.668990e+07 6.668991e+07 6.668991e+07 6.668991e+07 6.668991e+07 6.668994e+07 6.668994e+07 6.668995e+07
1 6.668995e+07 6.668995e+07 6.668995e+07 6.668995e+07 6.668995e+07 6.668998e+07 6.668998e+07 6.668999e+07
2 6.668999e+07 6.669000e+07 6.669000e+07 6.669000e+07 6.669000e+07 6.669004e+07 6.669004e+07 6.669004e+07
3 6.669004e+07 6.669005e+07 6.669005e+07 6.669006e+07 6.669006e+07 6.669009e+07 6.669009e+07 6.669010e+07
4 6.669010e+07 6.669011e+07 6.669011e+07 6.669011e+07 6.669011e+07 6.669014e+07 6.669014e+07 6.669015e+07

One thing we can do is plot each function’s distribution as it develops over simulated time:

[5]:
perfume.analyze.cumulative_quantiles_plot(samples)

We can run a K-S test and see whether our functions are significantly different:

[6]:
perfume.analyze.ks_test(perfume.analyze.timings(samples))
[6]:
test_fn_2 test_fn_3 test_fn_4
K-S test Z
test_fn_1 8.494414 8.831940 7.650598
test_fn_2 NaN 8.606922 8.550668
test_fn_3 NaN NaN 8.888194

We can convert them to elapsed timings instead of begin/end time points, get resampled timings to see outliers show a stronger presence, or isolate samples to be as if they ran by themselves

[7]:
timings = perfume.analyze.timings(samples)
bt = perfume.analyze.bucket_resample_timings(samples)
isolated = perfume.analyze.isolate(samples)
isolated.head()
[7]:
function test_fn_1 test_fn_2 test_fn_3 test_fn_4
timing begin end begin end begin end begin end
0 0.000000 7.879083 0.000000 1.532343 0.000000 29.188473 0.000000 9.990919
1 7.879083 13.432623 1.532343 2.720641 29.188473 59.487223 9.990919 19.318506
2 13.432623 21.638016 2.720641 3.269045 59.487223 93.088024 19.318506 28.697242
3 21.638016 29.194047 3.269045 6.157080 93.088024 128.720489 28.697242 39.406731
4 29.194047 34.441334 6.157080 7.378798 128.720489 160.600259 39.406731 48.569794

With these, and other charting libraries, you can do whatever you want with the data:

[8]:
from bokeh import palettes
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

fig, ax = plt.subplots(figsize=(16, 9))
for col, color in zip(timings.columns, palettes.Set1[len(timings.columns)]):
    sns.distplot(timings[col], label=col, color=color, ax=ax,
#                  hist_kws=dict(cumulative=True),
#                  kde_kws=dict(cumulative=True)
                )
ax.set_xlabel('millis')
ax.legend()
timings.describe()
[8]:
function test_fn_1 test_fn_2 test_fn_3 test_fn_4
count 158.000000 158.000000 158.000000 158.000000
mean 7.099615 3.423204 33.526883 13.455989
std 2.382655 8.588543 3.039051 2.193190
min 3.280499 0.548404 25.672770 8.190510
25% 5.908776 1.335742 31.432720 12.909524
50% 6.904165 1.840656 33.613588 14.226354
75% 7.966824 2.353760 35.680228 14.863801
max 29.985672 56.681542 41.308623 16.934982
_images/example_16_1.png
[9]:
import matplotlib.pyplot as plt
timings['test_fn_1'].hist(cumulative=True, normed=1, alpha=0.3)
timings['test_fn_2'].hist(cumulative=True, normed=1, alpha=0.3)
[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f94c9fa8f28>
_images/example_17_1.png
[10]:
import matplotlib.pyplot as plt
bt['test_fn_1'].hist(cumulative=True, normed=1, alpha=0.3)
bt['test_fn_2'].hist(cumulative=True, normed=1, alpha=0.3)
[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f94c9d21a58>
_images/example_18_1.png
[11]:
sns.pairplot(timings#, diag_kws={'cumulative': True}
            )
[11]:
<seaborn.axisgrid.PairGrid at 0x7f94c9ad46a0>
_images/example_19_1.png
[12]:
import scipy.stats
bt = perfume.analyze.bucket_resample_timings(samples)
(scipy.stats.ks_2samp(timings['test_fn_1'], timings['test_fn_2']),
 scipy.stats.ks_2samp(bt['test_fn_1'], bt['test_fn_2']))
[12]:
(Ks_2sampResult(statistic=0.95569620253164556, pvalue=5.5872505324246181e-65),
 Ks_2sampResult(statistic=0.71199999999999997, pvalue=4.691029271698989e-223))