## 11 thoughts on “Sampling design combinatorics”

1. I haven’t thought this through very seriously, but your problem looks similar to the Ising-Potts magnetism models that are studied in statistical mechanics. Each species could be associated to a different spin (give them values equal to the three complex power-3 roots of unity), and you would have some potential for the whole system that sums up how many “errors” there are in terms of same-species neighbour interactions. Given a constraint about the frequency of each species, I suspect the frequencies of the different neighbourhood would just take care of themselves. Or you might also put some global potential accounting for neighbourhoods, but that would probably complicate the algorithm a lot.

To optimize the associated potential (and thus obtain the map with fewest errors), you could use simulated annealing (sticking with statistical mechanics…) or a genetic algorithm that put different configurations in competition (and then the potential becomes a proxy for fitness).

• Hi Francois, thanks for the comment!

Sounds like an interesting idea that we could try out. One thing that worries me a bit is the SA optimization with constraints for this problem, I don’t think we’ll get this out-of-the-box, or do you know a software that would do this? But I guess that is something one could also do by hand, we were doing a similar Ising-model optimization with constraints in an older paper http://arxiv.org/abs/0808.0111 (Appendix A), and I don’t remember it being a problem.

2. Jörg says:

Here is a R script, which does the layout. However, if it happens quiet often, that it runs out of subplots,
but warnings are issued then. It’s quich and dirty, maybe you find someone to optimize it.

# plot layout
x=12
y=12
# number of diff samples
n=3

# plot A, which holds
A=array(NA,c(x,y))
# repeat 1 through n as many times as subplots are in A
s=rep(1:n, x*y/n)

# stop after kmax iterations in the while loop
kmax=10000

# loop over supplots in A
for (i in 1:x) {
for (j in 1:y) {
# if a suitable subplot was found
found=FALSE

# number of remaining suplots
ns=length(s)
# counter to break possible infinite while loop
k=0
while (!found) {
k=k+1
if (k>kmax) break
# choose a sample from remaining subplots
A[i,j] = sample(s,1)
# assume it suits
found=TRUE
# if it doesn’t reset
if (i>1 && A[i,j] == A[i-1,j]) found=FALSE
if (j>1 && A[i,j] == A[i,j-1]) found=FALSE
}
# warn if equal subplots are next to each other
if (found==FALSE) warning(“*** layout not possible, ran out of possible subplots! ***”)

# break if the last subplot is gone
if (length(s)==1) break
# first remove all equal subplots,
s=s[! s %in% A[i,j]]
# the append one less to the remaining subplots
s=append(s,rep(A[i,j],ns-length(s)-1))
}
}
# text and image output
print(A)
for (i in 1:n)
print(paste(i, “: “, length(A[A==i]),sep=””))
image(A)

• Hi Jörg,

excellent, this seems to work, at least with the basic constraints. What I would like to check still is whether the neighborhood conditions are balanced, but maybe this is not so important anyway. I have put the code on GitHub https://gist.github.com/florianhartig/8433880 . If you want, write me your name via email so that my colleague can acknowledge you in case this is used.

3. Here’s my answer Florian. It’s got some basic error handling, and it’s generalizable. I’ve gotten it to work on grids up to about 100 x 100 (then I got bored) reasonably quickly. It’s essentially “smart brute force” (brute force sampling with some probability guidelines to make the sampling more efficient. You can see my gist here: https://gist.github.com/emhart/8515826

• Hi Ted,

thanks for that, it’s amazing for us to see that so many people take an interest in this problem.

I’ll look at the code in more detail tomorrow. Cheers, Florian

• Yeah, I was just reading blogs with the Sunday morning coffee and it seemed like a fun problem to tackle. Like a crossword but a bit more rewarding.

• Hi Ted, just to confirm that your code works excellent, thumbs up!

4. I’d like to say thanks to all (meaning Francois, Jörg and Ted) that posted answers here, it has been really great to get so many good suggestions. I hope that the posted code will be useful for others as well!

In case that is of interest: the experiment that is currently in planning and that triggered this question is conducted by Michael Scherer-Lorenzen http://www.geobotanik.uni-freiburg.de/Team-Ordner/mscherer , so watch out for results on 3-species competition on 12×12 grids in the years to come!

5. Deinst says:

I have looked at generating such 12×12 tables uniformly. After spending some time trying to get a good markov chain that was suitable for coupling from the past (trickier than I thought), I gave up and counted them directly.

There are only 8.43e+26 ways (843255228025765198466017020 to be exact) to color a 12×12 grid with 3 colors with no two adjacent cells colored the same way. This was computed using a dynamic programming approach (see https://bitbucket.org/deinst/tricolor/ for incomprehensible, inefficient code (I intend to clean both problems up, time permitting).

If sampling tables uniformly is important one can use the dynamic programming tables used to count the number of tables to generate uniform tables quickly (Initially generating the dynamic programming tables is slow though.) The brute force selection methods do a relatively poor job of approximating uniform sampling — there are more than 2000 times as many tables with first row “1 0 1 0 1 2 1 2 0 2 0 2” as “0 2 1 0 2 1 0 2 1 0 2 1”

I believe that once some obvious, (and much needed) optimizations are done the code can be modified to generate tables where each of the 6 types of two horizontally adjacent cells occurs the same number of times. I’m not sure how to make even stronger restrictions on neigborhood abundance.

• Hi David,

843255228025765198466017020 options, crazy.

“The brute force selection methods do a relatively poor job of approximating uniform sampling — there are more than 2000 times as many tables with first row “1 0 1 0 1 2 1 2 0 2 0 2″ as “0 2 1 0 2 1 0 2 1 0 2 1″”

–> Yes, that reflects my initial guess that there are some neighborhood combinations that appear more often than others. I’m not quite sure whether Michael does mind for his experiment, but for an ideal statistical design it would of course be great to balance neighborhood conditions, or whatever you are interested in. I guess once you have a representative list of solutions, you can always use stratified sampling to draw from them in a way that is balanced according whatever you want to test.

Thanks for your code – I begin to wonder whether this threat demands a rejoinder at some stage to summarize all the ideas that have come up in the comment section.