Arduino Flickering Candle

Update December 24, 2013: Mokus refined his code so that the distribution is now well-behaved (nearly normal) and the PSD no longer turns up at high frequencies). The plots and post have been updated to reflect this change. He will push code to the same link as available.


In my previous post on Candle Flame Flicker I describe the statistics of the optical intensity flicker from a candle in terms of the probability density of the brightness samples and in terms of the power spectral density of the brightness. In this article I discuss how to make an Arduino drive an LED to flicker in a way that is statistically similar to the measurements I took on a real candle.

In the measurements I observed the spectral roll-off of the candle to start between about 5 and 8 Hz, and to decline at a nominal rate of around 44 dB for each decade increase in frequency. The 2nd-order infinite impulse response filter is computationally efficient in terms of using a small amount of memory and requiring few calculations. However, the Arduino is not good at floating point arithmetic. On the Arduino, floating point is done in software, has relatively few bits of precision, and is about 4 to 40 times slower than fixed point (integer) math. It is quite difficult to find useful benchmarks. The basic process is to create white noise and then filter it to the correct spectral shape. Afterward, the signal is scaled to have the appropriate variance and offset with a constant to represent the average brightness of the “flame”.

The approach I used was to design the IIR in Python with scipy’s signal module. I specified a 2nd order lowpass Butterworth filter, specifying a cutoff frequency of 8 Hz, and a sample frequency of 60 Hz. I normalized the coefficients to work in a 16 bit integer space, following Randy Yates’ 2010 Practical Considerations in FIR Fixed Filter Implementations, mainly. From a synthesis perspective, there is some prior art. Philip Ching, a student at Cornell synthesized candle noise quite cleverly, though he neither reported nor replicated the correct statistics. A fellow with the handle Mokus did a very, very tiny implementation for a microcontroller with only 64 bytes of RAM. He helped me modify his code so I could compare his statistics, and after adjustment his spectrum and distribution match fairly well. The high-frequency of his PSD looks a little different from the other methods, but these may not be noticeable to the observer. Finally, there was Eric Evenchick’s coincidental post on hackaday. Mokus reported that Evanchick’s implementation has too slow an update rate; I noticed that Evanchick did not report on the statistics he was targeting nor what he achieved. I did not recreate his work to test.

Then, on to the tests. I really was interested in building and comparing statistics from a 16 bit implementation, a 32 bit implementation in both a Direct Form I and a Direct Form II implementation. Indeed, I had great difficulty getting the filters to perform because I kept misdesigning the integer coefficients and overflowing the registers. As I sought a solution to the 2nd-order filter approach, I also created a 4-stage digital equivalent of an RC filter. The 4-stage RC approach was always stable though it needed a higher sample rate and used much more processor power to create statistically consistent results. On the other hand, it has a more accurate spectrum. A comparison of three different 16-bit methods to Mokus’ and to the actual measurements is shown in the figure below. The legend shows the mean, standard deviation, and their ratio to the right of the label. The All my filters did a credible job of reconstructing the histogram.

histo_compare_16

The power spectral density (PSD) of the different methods tells a different story. The Direct Form II 16 bit filter is the most visually appealing of the methods I tried. It rolls off more like the physical data than the other methods, except compared to the 4-stage RC filter. The Direct Form II filter is more computationally efficient.

psd_compare_16

The results for the 32-bit versions show greater variance than the 16-bit versions, but the quality is not markedly better.

histo_compare_32

psd_compare_32

 

I wrote a proof code for the Arduino UNO both to see it flicker and to test the processor speed—separate parts of the code. The results are that compiling with 1.0.3 resulted in a 4,722 byte program that calculated 10,000 new values in 6,292 ms, or 629 microseconds per value. In theory this could produce a sample rate of nearly 1.6 KHz. Or another way of thinking about this is that the final code uses about 629 us/17 ms or about 4% of the processor capability of the Arduino UNO. That leaves a lot of resources available to do other Arduino work or maybe means it can fit in a cheaper processor.

I have released two pieces of code under the GNU Public License 3, you can get the Python I used for filter design and the Arduino test code at the links. If you want the data, please contact me through the comments and I am willing to provide it.

Advertisements

Ballistics and Repeatability

I have a rifle that shoots 3-inch 5-shot groups at 100 yards.  This is not adequate for hunting where commonly shots are 200 to 300 yards distant.  As I work toward resolving this, there are some interesting statistical problems of practical importance.

In each experiment, how many shots should I fire to measure the spread, 3, 5, 10?

Having fired these shots, how should I measure the “spread”?

When comparing different groups, how likely is it that I will reach a true conclusion with my measurement?

The two methods for measuring spread are (1) the distance between the two farthest points or (2) the standard deviation of the radial distance.  The standard deviation is much more difficult to measure, probably you would record an x and y position for each shot, and then calculate the average position, and each point’s distance to the average position, and then take the standard deviations of each point’s distance.  The maximum point-to-point distance is very easy to measure.  Given the work load, the standard deviation would have to provide better results than the maximum distance.  For example, if the standard deviation of a 3-shot group provided a better measurement than the 5-shot group, then it would save ammunition and precious range time.

To compare the performance of the two measurements I assumed a model.  The model consists of two different ammunitions (different brand of bullet, amount of powder, whatever).  Ammunition A produces a group with standard deviation of 1 inch at 100 yards.  This is just an example—a standard deviation of 1 inch would constitute really dreadful performance.  Ammunition B produces a group with a standard deviation between one and two inches.

The contrast ratio is defined as the standard deviation of B divided by the standard deviation of A.  If the contrast ratio is 1 then both kinds of ammunition perform the same.  We expect that it will be hard to tell the difference between ammunitions when the contrast ratio is near one.  The other critical element of the model is the statistical distribution.  Just like everyone else, I used the Gaussian.  I think that the results don’t generalize—heavy-tailed distributions probably produce different outcomes.

Then I simulated 10,000 groups, each containing 3, 4, 5, …,10 shots.  I simulated contrast ratios between 1 and 1.5.  With the data sets I considered a few metrics:

  • What are my chances of correctly determining which ammunition produced a larger group, using the sample standard deviation and the maximum spread measurements?
  • On average how well can I measure the contrast ratio, using the sample standard deviation and the maximum spread measurements?

The following figure shows that, using the maximum spread measurement, for contrast ratio of 1.5, a 3-shot group provides about a 76% chance of correctly determining that ammunition B produces a larger group.  A 5-shot group has taken the chance to about 85%.

correctness_bymax

Next, the same measurement using the sample standard deviations shows the same three shot group can differentiate correctly about 70% of the time, or for a  5-shot group about 75% of the time.  The difference is stark—no sane person would ever prefer measurements of the standard deviation in favor of the maximum.

correctness_bystd

How well can you quantify how much better A is than B?  The following chart shows the average ratio of the maximum spread from A divided by the maximum spread from B.  One observation is that by 10 shots the results appear to be biased.  On average the contrast ratio is underestimated.  The bias seems to be consistent enough to be corrected.

sampleContrastRatio_bymax

The ratio of the sample standard deviations, as the next plot shows, may be less biased.  By 10 shots, though, the estimator is shown to be terrible.  As with the maximum spread technique, the bias may be correctable.  The slow rate of convergence is not heartening.

sampleContrastRatio_bysd

My conclusion is that for small ratios a 5 shot group is barely adequate, though coarse differences can be revealed.  I will not waste time with exotic measurements of the spread—the maximum spread appears to be superior than the any of the other methods I have tried.  Without any information in advance of the test I would prepare for a five shot comparison.  Some test of statistical power should probably be applied, analogous to Student’s t-test but for the maximum spread.  Obviously, similar spreads should be treated with caution.

This study was performed with Monte Carlo processes because the analytical solutions for maximum spread probabilities are relatively complex—n-fold convolutions of distributions.  I believe the solutions to be available, but I lazily elected computer horsepower to Park-power.

Coffee: Model Fitting Goodness

You have seen the response function contours through the stationary point, and you have seen the 3D response function visualizations. You may recall that I have two competing models, one is derived from the experimental data design for the response surface, and the other is all that data and the screening results too. Is either model any good? Which one is better?

First, a quick review of the model:

q = b0T2 + b1t2 + b2r2 + b3Tt + b4tr + b5Tr + b6T + b7t + b8r + b9

Fitting produces numbers for all those coefficients, b0, b1, etc. Look at the goodness of individual coefficients in the following table. The more asterisks in the significance column, the better the coefficient corresponds to the data. Notice that the “all data” model has an overall terrible goodness, with the residual standard error of 0.7, compared to the residual error of 0.358 for the RSO data only.

All Data

RSO Data Only

Std Error

0.749

0.358

Estimate Std. Error Signif. Estimate Std. Error Signif.
(Intercept)

3.344

0.3963

***

3.6667

0.2064

***

temp

-0.355

0.2369

-0.4375

0.1264

*

time

0.235

0.2369

0.0375

0.1264

cwrat

0.4

0.2369

0.275

0.1264

.

temp:time

-0.5

0.2901

-0.75

0.1788

**

temp:cwrat

0.02

0.2901

-0.375

0.1788

.

time:cwrat

0.09

0.2901

-0.075

0.1788

temp^2

-0.401

0.3592

-0.6833

0.1861

*

time^2

-0.201

0.3592

-0.4833

0.1861

*

cwrat^2

0.174

0.3592

-0.1083

0.1861

Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1

The two graphs below show the quantile normal plots for the residuals from each model. The all-data model looks more systematically erroneous than the RSM data only. The systematic error suggests that the choice of model is not particularly good. Furthermore, it suggests that the estimates of F-statistic should be considered hesitantly, as F-statistics are calculated by assuming normally distributed independent errors.

With hesitation then, let’s examine the F-statistic for these models. The RSO data model produces an F-statistic of 6.4, well above the Box, Hunter, and Hunter criterion that the F-statistic exceeds 4. On the other hand, the all data model’s F-statistic is a miserly value of 1.

All Data RSO Data Only
Degrees of Freedom 9 5
Residual Standard Error 0.749 0.358
F-statistic 1.06 6.43

In conclusion, I get a better fit—to the indicated model—using only some of the data. This is, frankly, an indictment of the model. Anyone can carefully select data to produce a good fit to a model, and the fact that I have done so may be considered a round condemnation of my objectivity. I may reconsider the model, and attempt to drop the least important terms while adding either a three-way interaction or two-way involving squares. Perhaps more creative math will be required.

Coffee: Visualizing the Response Function

I have been calling my 3D response function a “surface”—which is linguistically feckless at best. Of course, I have some company, the immortal Box, Hunter, and Hunter call the experiment design a response surface method. So note carefully, the response function is not a surface, it completely fills a volume.

Naturally, this is hard to visualize. The contour plots are slices through the response space, but I really have a hard time inferring what my coffee quality function is telling me. Better visualization to the rescue.

First, put the contour plots together into slices, using just the shaded parts and not the lines. Note that these cuts run through the center of the design point, and not the stationary point.

This is still quite difficult to interpret. The blue and green corners of the areas are clearly to be avoided, while the red-tinted areas near the top are to be embraced. Red, in all these plots, is a higher value of q than blue. The weird little balls are the points on the steepest ascent experiment path. Their radius corresponds to their predicted q values. Tufte, of course, would have a fit; clearly the volume should be proportional to the indicated variable.

Those planar cuts are nice to look at, but we can do even better. Add “isosurfaces”—these are surfaces that lie on constant values of q. In two dimensions a topo map shows lines of constant elevation. For 3D data, the lines become surfaces. In the following figure I’ve kept the horizontal cut, and added isosurfaces. The big black line shows the steepest ascent.

This is getting useful. To make a good cup of coffee, you want to be somewhere in the red cup on the upper-left-front corner. The zone is not enormous. To more clearly visualize this, here is the same concept from a few different angles.

So, here we are, and I still don’t have a good idea what the response function looks like outside the design space. Clearly there is a region in the middle which is relatively high quality, and it appears that higher C/W ratios are favored.

Finally, to get a solid sense of the response function I thought it would be helpful to zoom out. The next graph shows the response function in extrapolated over a much wider range.

The little box in the middle is the design space. In this case there are four isosurfaces drawn, for q=1,2,3,4. With this visualization you can clearly see the shapes represented by the response function. I’m sure that the function is a poor representation of truth in the regions far from the design space. Nevertheless, this diagram is more helpful to me than any other, as I can see there is clearly a cup-shaped region in which my coffee should be really awesome. The region is surprisingly narrow over C/W ratio, time, and temperature. A poor-man’s sensitivity analysis.

All the previous analysis has been on the response surfaces for the RSO data set only, that is, I left out the original data from the screening experiment. If I include the alternate data the fit remains qualitatively similar in that there is a tasty region in the upper left hand part of the original design space. However, the rest seems very different. There is now a saddle point, and more of the design space is better than before. Now, too, the steepest ascent path points toward increasing C/W ratio and decreasing temperature. Unlike the previous fit, though, there is substantial decrease in time inferred. Furthermore, this surface admits quality values in excess of 5. The upper isosurface is q = 5.

Resources

On the off chance you’re wondering how I made these really cool plots…

Analysis of the experiment and fitting of the response model to the data was accomplished in the wonderful language R. I couldn’t find any way to do cut-planes in R, and in any case am not terribly good with it. So I used Python(x,y) with the Enthought Tool Suite, in particular the mlab and Mayavi interfaces. Many, many thanks to the good people of the Pythonic world, and especially Enthought, for the quality product they have created. Who needs Matlab anyway?

My source code is available upon request.

Coffee: Analysis & Results of the Response Surface

I have executed the experiment design discussed Coffee: Design of the Experiments (Part 2: RSO). This “response surface method” experiment design is carefully crafted to provide the data necessary to fit a formula of the form

q = b0T2 + b1t2 + b2r2 + b3Tt + b4tr + b5Tr + b6T + b7t + b8r + b9

This model describes quadratic terms (T2, t2, r2), main effects (T, t, r), and first-order interactions (Tt, tr, Tr). The model basically assumes that quality is locally curved or locally linear, but not an undulating surface. This assumption of local smoothness is a good one in this case—were it not the experience of buying a cup of a coffee would be a dangerous gamble where small changes made by the brewer resulted in big changes in our cup.

Some possible findings in our analysis

  • There is a maximum value of q at some point in the design space, whose approximate location we can find with the preceding method.
  • There is a maximum value of q at the edge of the design space, that is, we might find that the best q occurs at some maximum value of concentration (a face of the design cube), or even, for example, the maximum value of concentration and temperature (an edge of the design cube), or a maximum of all the values (a corner of the design cube).
  • Some variables don’t matter, and the entire experience is governed by only one variable—indeed the screening experiment hinted at this outcome.

If the optimum is inside the design space, then I know immediately how to brew the best cup of coffee—within some amount of error. If the optimum is outside the design space, at least I know which experimental trend to pursue. Namely, new experiments are positioned along the path of most rapid ascent, or a new design cube is positioned outside the current design cube, with the center moved along the path of most rapid ascent.

Results

The following three graphs show slices through the response surface fit to the experiments. The slices are positioned so that they intersect the “stationary point”. If there is a local maximum, then it is the stationary point; the plots then display the local sensitivity around that stationary point. The quadratic fit can also produce saddle-shaped responses, in which case the center of the saddle is a stationary point.

Note that these plots are all based on the fit to just the response surface design. A different fit is obtained by including the data from the screening results.

The graphs all work in “coded variables”, that is, they are all adjusted so that the values lie between -1 and 1. It is necessary to decipher these values into measureable numbers, which the following equations do. The coded value is indicated with a c subscript.

Time tc = 1.35log10(t) – 2.35

Temperature Tc = 0.1(T – 195)

C/W Ratio rc = 50(r – 0.055)

These equations are easily inverted, though below I provide uncoded (normal) values for the points in the steepest ascent.

The fitted space suggests that water temperature is important—certainly more important than the screening experiment suggested. However, my cup quality actually improves as the temperature declines. Almost all the brewing words I’ve read, including the latest Consumer Reports, harp on the importance of getting water hot enough. My experiment suggests that modulating extraction duration and C/W ratio are at least as important.

Steepest Ascent

The response surface methodology encourages further experiment design. Starting from the stationary point, you can follow the path of steepest ascent to move toward ever better quality—at least in theory. To my surprise, this worked for a one-trial qualitative test; more on that later. The steepest ascent predicted using the fit to just the RSO data is shown in the next table. Note that the concentration of coffee is quickly leaves the design space, which was limited to concentrations less than 0.075.

Steepest (RSO only)
Time [sec]

Coffee [g]

Water [fl oz]

Temp [F]

C/W Ratio [g/ml]

Predicted Q

55

22.3

13.7

195

0.055

3.7

69

22.3

12.1

192

0.062

3.9

90

22.3

10.7

190

0.071

4.1

116

22.3

9.5

187

0.079

4.3

148

22.3

8.6

185

0.088

4.4

188

22.3

7.8

183

0.096

4.6

238

22.3

7.2

181

0.105

4.7

301

22.3

6.6

179

0.114

4.8

380

22.3

6.2

177

0.123

4.9

479

22.3

5.7

175

0.131

5.0

601

22.3

5.4

173

0.140

5.0

A similar fit using all the available data produced the steepest ascent trials in the next table. It is, perhaps, less believable since the quality values rise in excess of 10.

Steepest (RSO + Screening)
Time [sec]

Coffee [g]

Water [fl oz]

Temp [F]

C/W Ratio [g/ml]

Predicted Q

55

22.3

13.7

195

0.055

3.3

78

22.3

12.0

193

0.063

3.7

105

22.3

10.4

192

0.072

4.0

132

22.3

9.2

191

0.082

4.5

161

22.3

8.2

190

0.092

5.1

192

22.3

7.4

190

0.102

5.7

226

22.3

6.8

190

0.112

6.4

264

22.3

6.2

189

0.121

7.2

307

22.3

5.7

189

0.131

8.1

355

22.3

5.3

189

0.141

9.1

409

22.3

5.0

188

0.151

10.2

Despite the implausibility, I conducted a trial of the 3rd experiment in the combined table, that is, 105 seconds extraction at 192 F for a 0.072 concentration coffee. My results:

“Quality = 4.2. Toasted marshmallow flavor! Aroma is rich and without char. Acid and bitter are nicely balanced in the first sip but the bitter dominates the aftertaste.”

The quality agreement (and I really didn’t peak) is unexpectedly close—4.2 instead 4. Furthermore, there I was genuinely surprised at entirely new flavors. It really was a great cup of coffee. Damned strong though.

For comparison, the contour plots produced by the fit to the RSO data and the screening data are shown next.

A future post will discuss the goodness of fit, and whether the models are meaningful. For the present, rest assured that predicting a quality of 4 and measuring the same convinces me that the results are useful.

Data

The actual data, along with my notes at the time, is shown in the following table.

Coffee [g]

Water
[fl oz]

C/W
[g/ml]

Time [sec]

Temp [F]

Trial Order

Q

Date

Comment
22.8

14.0

0.055

300

205

1

1.5

2/20/2009

Not good. Bitter flavor with simple aroma. Too bitter
22.8

22.0

0.035

55

205

2

2.5

2/23/2009

Simple flavor with muted cemplexity. Aroma and flavor dominated by char. Neither too bitter nor too acid. Not balanced though.
22.8

14.0

0.055

10

205

3

2.5

2/24/2009

Too bitter. Aroma fairly simple. Mouth feel too watery. Little acidity, poorly balanced between acid and bitter.
22.8

10.3

0.075

55

185

4

4

2/25/2009

Aroma is excellent w/ complex interesting smells. There is some crema. Nicely acidic and reasonably balanced w/ bitter.
22.8

14.0

0.055

55

195

5

3.5

3/2/2009

Acid-bitter balance too heavy on bitter side. Nice crema and nice aroma. Rich mouth feel.
22.8

22.0

0.035

10

195

6

3

3/3/2009

Moderately well balanced. Aroma is simple. Mouth feel watery.
22.8

14.0

0.055

55

195

7

4

3/4/2009

Well balnced w/ good aroma. A little too bitter, but not unpleasant.
22.8

10.3

0.075

10

195

8

3.5

3/5/2009

Aroma somewhat simple. Flavor strong or even rich. Too bitter overall and acid/bitter balance poor.
22.8

14.0

0.055

10

185

9

2

3/6/2009

Bitter simple flavor without much acid. Aroma is farily good but not as rich as I like.
22.8

10.3

0.075

300

195

10

3

3/9/2009

Smelled bitter. Taste moderately well balanced with a little too much bitter. Rich feel.
22.8

22.0

0.035

300

195

11

2.8

3/10/2009

Watery. Yet implanaced toward bitter. Flavor is well-bodied and aroma good.
22.8

10.3

0.075

55

205

12

2.5

3/11/2009

Burned and charred taste. Too bitter while also being nicely acidic. Not well balanced. Aroma dominated by char.
22.8

22.0

0.035

55

185

13

2.5

3/12/2009

Aroma dominated by char and relatively uninteresting. Taste is poorly balanced–too bitter. Strong flavor.
22.8

14.0

0.055

300

185

14

4

3/16/2009

Well balanced and nicely aromatic. A little bitter on the finish.
22.8

14.0

0.055

55

195

15

3.5

3/17/2009

Bitter flavor with weak mouth feel and low complexity. Aroma nicely balanced though.

Coffee: Design of the Experiments (Part 2: RSO)

In Review

The results of my 4-trial fractional factorial experiment (main effects) showed that extraction time is the most important variable, and that temperature and C/W ratio are less important. In the interest of completeness this experiment will not simply vary extraction time, but rather seek to efficiently find the maximum.

Lessons from the Screening Experiment

I am accustomed to making my coffee with about 22 fluid ounces of water, which makes two nice cups for my morning. During the screening I used 22 fl oz as a baseline, and increased or decreased the amount of coffee accordingly. The result was that some experiments required 48.8 g of coffee. You can’t possibly imagine how much coffee that is. It almost completely fills a coffee cup with just the beans!

In subsequent experiments I will limit the amount of coffee beans to something more modest, perhaps about 20 g, and vary the amount of water. With this more cautious approach I may live to see the final results.

The Design

NIST offered three designs for response surface objective (RSO) experiments. Specifically there are two interpretations of the Box-Wilson central composite design, each of which requires 20 total runs. An alternative design reduces the number of experiments to 15, which is the reason I chose it.

Specifically, the Box-Behnken design provides an RSO approach for 3 factors in 15 experiments. That is the same number I originally suggested through my naïve approach, but without all the baggage. Effectively, my original 5 values for each factor get reduced to three, and those are shuffled about in a way that is well-suited to estimating their effects. In my case, the specific experiments would be:

Response Surface Objective

Actual Values

Statistics Jargon

Trial

r (g/ml)

t (sec)

T (F)

R

T

T

1

0.035

10

195

-1

-1

0

2

0.075

10

195

+1

-1

0

3

0.035

300

195

-1

+1

0

4

0.075

300

195

+1

+1

0

5

0.035

55

185

-1

0

-1

6

0.075

55

185

+1

0

-1

7

0.035

55

205

-1

0

+1

8

0.075

55

205

+1

0

+1

9

0.055

10

185

0

-1

-1

10

0.055

300

185

0

+1

-1

11

0.055

10

205

0

-1

+1

12

0.055

300

205

0

+1

+1

13, 14, 15

0.055

55

195

0

0

0

This set will take me about three weeks. With luck by the time you read this I will have trickled out previous articles, and you will have results in two weeks or less. Sorry for the suspense.

Oh yes, I’m interested in other people’s Q functions too, so if you want to run my exact experiments, please post your data and I will post your results. Public or anonymous, at your discretion.