IMPORTANT UPDATE 16.4.: Since this was posted, Jan Thiele made some improvements to RNetLogo, and there will be a new version out soon with improved performance for large data transfers. Jan also noted that the new NetLogo version 5.0 is considerably faster than the 4.x line which was used to create the figures below, so if runtime is critical, switching to 5.0 might already do the job for you (5.x has improved on the list handling which is used to get the data into R). I’ll update this post when the new version becomes available.

**ADDITIONAL UPDATE 30.05.: The information below is largely obsolete. See here for the new information promised above.
**

I was playing around with the RNetLogo package today. For those who don’t know: NetLogo is a modeling platform for agent-based (aka individual-based) models. It has a lot of nice features, but I guess what people like most about it is that one can have a first prototype of a model very fast, usually within hours, which is great for trying out new ideas or for teaching. A downside used to be that complicated mathematical / statistical calculations are often somewhat difficult to generate from the limited library of NetLogo functions, and they can also be quite slow (at least my experience). However, there is the NetLogo API, which can be used to make calls from and to NetLogo, which Jan Thiele used for his NetLogo-R-extension (calls R from NetLogo) and his more recent RNetLogo package, which allows controlling NetLogo from R, so this is a great advance.

I was interested in seeing whether the RNetLogo interface is fast enough for getting a lot of detailed data from NetLogo into R, so something like the state of each individual at each time step for a large number of individuals. As a test, I timed three options to get x and y coordinates of all individuals for 5000 individuals (for syntax see the manual):

> ptm <- proc.time() > test1 <- NLGetAgentSet(c("xcor","ycor"),"turtles") > proc.time()- ptm User System verstrichen 4.19 0.00 4.24 > > ptm <- proc.time() > test2 <- NLReport("[(list xcor ycor)] of turtles") > proc.time()- ptm User System verstrichen 4.22 0.01 4.62 > > ptm <- proc.time() > test3 <- data.frame(NLReport("[xcor] of turtles"),NLReport("[ycor] of turtles")) > proc.time()- ptm User System verstrichen 0.28 0.02 0.22

and for 20.000 individuals:

> ptm <- proc.time() > test1 <- NLGetAgentSet(c("xcor","ycor"),"turtles") > proc.time()- ptm User System verstrichen 18.93 0.04 18.67 > > ptm <- proc.time() > test2 <- NLReport("[(list xcor ycor)] of turtles") > proc.time()- ptm User System verstrichen 19.47 0.08 19.75 > > ptm <- proc.time() > test3 <- data.frame(NLReport("[xcor] of turtles"),NLReport("[ycor] of turtles")) > proc.time()- ptm User System verstrichen 3.25 0.00 3.23

So, considering that we speak about something like 2*N*8Bytes (double), it slow obviously, way slower than a call between R and C or R and Python would be, but I guess still acceptable for many applications. The third option, reporting each coordinate individually, is much faster than reporting them together, which is a little surprising according to usual programming logic because that way NetLogo has to go through it’s list of individuals twice. I could imagine that the list command of NetLogo is the culprit.

Anyways, simple reporters seemed the way to go, so I made a few further tests to see how this options scales with the number of individuals. Below you see the plot, the scaling on my laptop was roughly quadratic, unfortunately, not sure though whether this is because of the interface or because of the memory access of NetLogo, which seemed to do quite a bit of hard disk access for a larger number of Individuals.

ADDITION: Jan Thiele pointed out that something very close to the faster version 3 is already implemented in RNetLogo. Using the “as.data.frame=TRUE” command in NLGetAgentSet (see below, 20.000 individuals) yields runtimes very similar to version 3.

> ptm <- proc.time() > test4 <- NLGetAgentSet(c("xcor","ycor"),"turtles", as.data.frame=TRUE, df.col.names=c("xcor","ycor")) > proc.time()- ptm User System verstrichen 3.59 0.00 3.47

As this list is already ordered, this option is probably usually preferable to version 3, where both x and y coordinates are in random order and therefore not necessarily belonging to the same agent when at the same position. I made a few checks, the scaling with the number of individuals seems to remain the same with this variant. Thanks to Jan for the comments!

Pingback: RNetLogo data transfer rates – an update « theoretical ecology