Monday, 29 June 2015

taming our inner dinosaur

Why we need to call out put-downs.
our enlightened selves exert rather weak control on our everyday behaviour, and every one of us is only too ready to think of themselves as less prejudiced than the average person. It will be very difficult to root out the often subtle put-downs of women and other members of out-groups that slip into references or discussions. We can detect them more easily in others than in ourselves, and therefore we can help each other by calling them out. Calling out unacceptable remarks made by Fellows in public is a case in point.

For all my social networking posts, see my Google+ page

Thursday, 25 June 2015

Wednesday, 24 June 2015

jaguars and leopards and cheetahs, oh my!

we are now vulnerable to some specific kinds of sofas
or why the recent improvement in neural net image recognition might be problematic

For all my social networking posts, see my Google+ page

Tuesday, 23 June 2015

Monday, 22 June 2015

the calm after the storm

Typical summer’s day.  It has been hammering down with rain.  When it stopped, and the sun came out, the garden was a glorious picture against the still black sky:

Sunday, 21 June 2015

graphic research

Pictures, graphics, charts, etc are very important in communication.  Here’s a whole dissertation presented in graphic novel form.  This makes a pleasant change from too many papers on the importance of graphics that themselves include not a single graphic!

For all my social networking posts, see my Google+ page

great troll quote

Here's a great troll quote as seen in the comments somewhere on BoingBoing.

Troll quotes explained.

For all my social networking posts, see my Google+ page

airports lie

I had a nice hop over to Amsterdam on Friday (flew out Thursday night, flew back Friday night) for an interesting Colloquium.

Here’s the obligatory photo from my hotel bedroom window, in the centre of Amsterdam:

There were three pieces of information about Schipol airport that, when taken together, could be have been problematical.

Easyjet boarding pass: Gate closes 30 minutes prior to departure
Board announcing departure gate initially: Gate is announced 40 minutes prior to departure
Board announcing departure gate later: walking time from here to gate, 26 minutes

Fortunately there are two lies in that information.
The gate opens less than 30 minutes prior to departure.
The walking time was more like 10 minutes.

However, I did spend a considerable amount of time walking around airports last week.

Saturday, 20 June 2015

criticising criticising criticism of a critcism

Here’s a few interesting posts about the Tim Hunt story:

In Sympathy for the Devil?, Michael Eisen provides some interesting background to the original story.  That background makes it even worse.

Another thoughtful entry in the debate is Janet D. Stemwedel’s post on why Good Scientists Should Publicly Criticize Tim Hunt's Claims.

I had assumed there were women involved in the history of Hunt’s discoveries.  Helen Cahill’s Guardian piece provides some details on The unseen women scientists behind Tim Hunt’s Nobel prize.

So, the story so far:
  • Tim Hunt criticises women (oh, my bad, “girls”) for being far too sensitive to criticism to do science.  When explicitly asked about his statements, he said he was trying to be honest,
  • Several women (and some men) call out Hunt for his blatant sexism.  And an amusing line of “distractingly sexy” photos gets tweeted.  
  • Hunt appears rather sensitive to this criticism.
  • Many men (and some women) then criticise these people for criticising Hunt, using oh so reasoned and non-hyperbolic terms like “baying witch hunt” and “feeding frenzy of mob-rule self-righteousness”.
So, when a man makes a sweeping statement of how women should be kept out of the lab, it’s just a joke, banter, unimportant.  When women call him out on it, that’s a witch-hunt, stifling academic freedom, a lynch mob, thought police.

The usual double-standard at play.

It’s not so much the original comments, bad enough as they are, as the doubling down and not-pologies and subsequent whinging, that causes the fury, by the way.  Other prominent people have made a huge gaffe, had it brought to their attention, realised they did wrong, apologised sincerely, and things have moved on (apart from a few people trying to say they were bullied into apologising, because, hey, why else would they have done so?).  Tim Hunt needs to learn about the first rule of holes.

Oh, and if an eminent scientist were to say that labs should be racially-segregated, or something equally racist, and was called out on it, would these people leap to his defence then, saying it was just a joke?  I suspect not.  So what does that tell you about how the defenders view this issue?

Thursday, 18 June 2015

it went so fast

So, apparently today is the 200th anniversary of Abba winning the Eurovision Song Contest, or something.

heterotic computing special issue

Over the last year or so, my coeditors and I have been preparing a special issue of Philosophical Transactions of the Royal Society A, on the topic of heterotic computing, the combining of two or more computational substrates to get interesting computational properties.

The issue has just been published.  Massive thanks to the journal staff: it has been a pleasure to work with them (even when they were nagging us to do things :-)

We have written an introduction to the issue, which you can read online:

     Viv Kendon, Angelika Sebald, Susan Stepney
     Heterotic computing: exploiting hybrid computational devices
     Phil. Trans. R. Soc. A 2015 373 20150091; doi: 10.1098/rsta.2015.0091.

We also have a paper of our own in the issue:

     Viv Kendon, Angelika Sebald, Susan Stepney
     Heterotic computing: past, present and future
     Phil. Trans. R. Soc. A 2015 373 20140225; doi: 10.1098/rsta.2014.0225.

You will need a journal subscription to read that one, until the open access embargo period is up.  (btw, the journal arranged a separate anonymous peer review of our paper: we didn't organise it ourselves!)

If you want to find out more about the genesis of the special issue, there is also a blog post at the Royal Society publishing blog,

Tuesday, 16 June 2015

why being a postdoc sucks

Sabine Hossenfelder on “why being a postdoc sucks”:
The plight of the postdocs: Academia and mental health

For all my social networking posts, see my Google+ page

Sunday, 14 June 2015

Saturday, 13 June 2015

exploring my hard drive

Why oh why doesn’t the Windows file explorer show the size of directories?  It’s a real pain when the drive is filling up, and you need to find out where all the space has gone.  Today I noticed my 1TB hard drive was nearly full.  That’s just ridiculous.  My very first hard drive, way back in the day, was 28MB.  Yes, megabytes.  1TB is essentially infinite!

Rather than trying to walk up and down the explorer tree, looking for big files, I had a google to see what other people are doing.  I discovered WinDirStat.  Easy to install, easy to run.

My hard drive usage now look like:

Pretty!  And informative.  The rectangles are colour coded by file type.  The big grey rectangle is free space: it was invisibly small when I started.  Freeing up space was easy once I found where all the space was going.  I deleted several very large zip files I had been keeping “just in case”.  Then I deleted them from the recycle bin.  Sigh.

It’s very easy to explore what all the space is being used for, both from this graphical view, and through a more conventional file listing view.  Clicking on a rectangle highlights the relevant directory or file in the listing view, and vice versa.  I spent rather more time wandering around the drive than I needed to, even after I’d deleted some files.  A very nice tool.  And now I have room for it, and a lot lot more, on my drive.

sequestering carbon, several books at a time XLVI

The latest batch:

I got The Goblin Emperor since it has been nominated for a (non slate) Hugo, yet I had never heard of Katherine Addison (although I have heard of Sarah Monette).  I’m clearly falling behind with more than just my reading!

Friday, 12 June 2015

technology in orbit

Lovely clear sky last night; around 10:15pm BST we watched the very bright ISS go overhead. It started very near Jupiter (which is easy to find as it’s near Venus) and brightened as it arced across the sky.

Then, 25 minutes later, we watched an Iridium flare, also close to Jupiter.

Yay technology!!

Update 14 Jun 2015:

So people get very nice pictures of Iridium events close to (on top of!) Jupiter.

For all my social networking posts, see my Google+ page

Thursday, 11 June 2015

Tim Hunt gives dinosaurs a bad name

Seen on BuzzFeed:

An invitation-only lunch in honor of women in science has a shockingly sexist speaker who admits that he has a reputation as a chauvinist.

Who issued the invitations?

Seen later on the BBC:
He told the BBC he “did mean” the remarks but was “really sorry”.
Presumably he was “really sorry” because he got called out for saying what he said?  What else could he be sorry for, given he “did mean” what he said?
he said he was “really sorry that I said what I said”, adding it was “a very stupid thing to do in the presence of all those journalists”
Ah yes, he noticed it was a stupid thing to say in the presence of all those journalists (not that it was a stupid thing to say, full stop), but only because all those journalists then went off and reported on what he said?

For all my social networking posts, see my Google+ page

flaming June

Sunday maintained its record as the sunniest day since (our) records began for a whole four days.  Behold Thursday:

56.82 kWh generated today: a whole 0.2 kWh more than Sunday.  And it’s clearly possible to do even better: it was slightly hazy around mid day.  Will we break the 57 kWh barrier?

Wednesday, 10 June 2015

How many times should you run your simulation?

You have a new algorithm, say an evolutionary algorithm.  You want to show it is better than another algorithm.  But its behaviour is stochastic, varying from run to run.  How many times do you need to run it to show what you want to show?  How big a sample of possible results do you need? 10?  100?  1000?  More?

This is a known problem, solved by statisticians, used by scientists who want to know how many experiments they need to run to investigate their hypotheses.

First I recall some statistical terminology (null hypothesis, statistical significance, statistical power, effect size), then describe the way to calculate how many runs you need, using this terminology.  If you know this terminology, you can skip ahead to the final table.

null hypothesis

The null hypothesis, H0, is usually a statement of the status quo, of the null effect: the treatment has no effect; the algorithm has the same performance; etc.  The alternative hypothesis, H1, is that the effect is not null.  You are (usually) seeking to refute the null hypothesis in favour of the alternative hypothesis.

statistical significance and statistical power

Given a null hypothesis H0 and a statistical test, there are four possibilities.  H0 may or may not be true, and the test may or may not refute it.

H0 true H0 false
refute H0 type I error, false positive correct
fail to refute H0 correct type II error, false negative

The type I error, or false positive, erroneously concludes that the null hypothesis is false; that is, that there is an effect when there is not. The probability of this error (of refuting H0 given that H0 is true) is called α, or the statistical significance level.

     α = prob(refute H0|H0) = prob(false positive)

The type II error, or false negative, erroneously fails to refute the null hypothesis; that is, it is a failure to detect an effect that exists.  The probability of this error (of not refuting H0, given that H0 is not true) is called β.

     β = prob(not refute H0|not H0) = prob(false negative)

1−β is called the statistical power of the test, the probability of correctly refuting H0 when H0 is not true.

     power = 1−β = prob(refute H0|not H0)

Clearly, we would like to minimise both α and β, minimising both false positives and false negatives.  However, these aims are in conflict.  For example, we could trivially minimise α by never refuting H0, but this would maximise β, and vice versa.  The smaller either of them needs to be, the more runs are needed.

Different problems might put different emphasis on α and β.  Consider the case of illness diagnosis, versus your new algorithm.

Diagnosic test for an illness

  • false positive: the test detect illness when there is none, which might result in worry and unnecessary treatment
  • false negative: the test fails to detect the illness, which might result in death
So minimising β may take preference in such a case.

New evolutionary algorithm

  • false positive: you claim your algorithm is better when it isn’t; this will be embarrassing for you later when more runs fail to support your claim
  • false negative: you fail to detect that your algorithm is better; so you don’t publish
So minimising α may take preference in such a case.


A statistical test results in a p-value.  p is the probability of observing an effect, given that the null hypothesis holds.

     p = prob(obs|H0)

A low p-value means a low probability of observing the effect if H0 holds; the conclusion is then that H0 (probably) doesn’t hold.  We refute H0 with confidence level 1−p.

To refute H0 we want the observed p-value to be less than the initially chosen statistical significance α.  A typical confidence level is 95%, or α = 0.05.

effect size

The more runs we make, the smaller α and β can be.  That means we can refute H0 at higher and higher confidence level.  In fact, with enough runs we can refute almost any null hypothesis at a given confidence level, since any change to an algorithm probably makes some difference to the outcome.

Alternatively, we can use the higher number of runs to detect smaller and smaller deviations from H0.

For example, if H0 is that the means of the two samples A and B are the same:

     H0 : μA = μB

and the alternative hypothesis H1 is that the means are different:

     H1 : μA ≠ μB

then with more runs we can detect smaller and smaller differences in the means at a given significance level.  We can measure a smaller and smaller effect size.

Cohen’s d is one measure of effect size for normally distributed samples: the difference in means normalised by the standard deviation:

     d = (μA − μB) / σ

So an effect size of 1 occurs when the means are separated by one standard deviation.  Cohen gives names to different effect sizes: he calls 0.2 “small”, 0.5 “medium”, and 0.8 “large” effect sizes.  The smaller the effect size you want to detect, the more runs you need.

d = 0.2, “small”

d = 0.5, “medium”

d = 0.8, “large”

“Small” effect sizes might be fine in certain cases, where a small difference can nevertheless be valuable (either in a product, or in a theory).  However, for an algorithm to be worth publishing, you should probably aim for at least a “medium” effect size, if not a “large” one.

sample size

The required sample size, or number of runs, n, depends on your desired values of three parameters: (i) statistical significance α, (ii) statistical power 1−β, and (iii) effect size d.

For two samples of the same size, normally distributed, with H0 : μA = μB, then

     n = 2( (z(1−α/2)+z(1−β)) / d)2

where z is the inverse normal cumulative probability distribution function.  In the case of a single sample A compared against a fixed mean value, H0 : μA = μ0, this number can be halved.

A Python script that calculates this value of n (rounded up to an integer) is
from math import ceil
from scipy.stats import norm
n = ceil( 2 * ( (norm.ppf ppf(1-alpha/2) + norm.ppf ppf(power) ) / d ) **2 )

If you want a significance at the 95% level (α = 0.05), and a power of 90%, and want to detect a large effect, then you need 33 runs; if you want to detect a medium effect you need 85 runs, and to detect a small effect you need over 500 runs.  (These numbers are based on the assumption that your samples are normally distributed.  Similar results follow for other distributions.)

At first sight, it might seem strange that the number of runs is somehow independent of the “noisiness” of the algorithm: why don’t we need more runs for an inherently noisy algorithm (large standard deviation) than from a very stable one (small standard deviation)?  The reason is because the effect size is also dependent on the standard deviation: for a given difference in the means, the effect size decreases as the standard deviation increases.  

So there you have it.  You don’t have to guess the number of runs, or just do “lots” to be sure: you can calculate n.  And it’s probably less than you would have guessed.

Tuesday, 9 June 2015

nice train food -- really!

Past security in the Brussels Eurostar departure lounge there is a cafe-bar that sells brie, honey and walnut baguettes. Delicious!

(Well, it was actually a brie, honey, walnut and rocket baguette, but I removed the rocket before eating.)

For all my social networking posts, see my Google+ page

Monday, 8 June 2015

Sunday, best day ever!

Seven weeks ago, I reported our sunniest day ever, as measured by our solar PV production.  Well, this Sunday was better:

The horizontal time axis runs from 3:00am to 9:00pm GMT. The vertical axis runs from zero to 8kW.

More than half a kWh better!  And given the lower peak value, and the cloud-caused dips in the generation, this extra is clearly due to the longer day in June than in April.

Last June was a bit rubbish for sun.

Tuesday, 2 June 2015


This is why you should never throw anything away!

For all my social networking posts, see my Google+ page