USE_GENERATIONAL_GC implements a non-moving generational collector with two or three generations. The generational collector divides allocated nodes into generations based on some notion of age. Younger generations are collected more frequently than older ones. Memory allocated by R_alloc is maintained in a stack. Code that R_allocs memory must use vmaxget and vmaxset to obtain and reset the stack pointer.
LEVEL_0_FREQ level zero collections a level 1 collection is done; after every LEVEL_1_FREQ level 1 collections a level 2 collection occurs. Thus, roughly, every LEVEL_0_FREQ-th collection is a level 1 collection and every (LEVEL_0_FREQ * LEVEL_1_FREQ)-th collection is a level 2 collection. When a level N collection fails to produce at least MinFreeFrac * R_NSize free nodes and MinFreeFrac * R_VSize free vector space, the next collection will be a level N + 1 collection.
malloc in chunks called pages. Pages are grouped into classes according to the size of the vector component they contain. Occasionally an attempt is made to release unused pages back to the operating system. When pages are released, a number of free nodes equal to R_MaxKeepFrac times the number of allocated nodes for each class is retained. Pages not needed to meet this requirement are released. An attempt to release pages is made every R_PageReleaseFreq level 1 or level 2 collections.
R_NSize and R_VSize are used for triggering collections. The initial values set by defaults or command line arguments are used as minimal values. After full collections these levels are adjusted up or down, though not below the minimal values or above the maximum values, to towards maintaining heap occupancy within a specified range. When the number of nodes in use reaches R_NGrowFrac * R_NSize, the value of R_NSize is incremented by R_NGrowIncrMin + R_NGrowIncrFrac * R_NSize. When the number of nodes in use falls below R_NShrinkFrac, R_NSize is decremented by R_NShrinkIncrMin * R_NShrinkFrac * R_NSize. Analogous adjustments are made to R_VSize. This mechanism for adjusting the heap size constants is very primitive but hopefully adequate for now. Some modeling and experimentation would be useful. We want the heap sizes to get set at levels adequate for the current computations. The present mechanism uses only the size of the current live heap to provide information about the current needs; since the current live heap size can be very volatile, the adjustment mechanism only makes gradual adjustments. A more sophisticated strategy would use more of the live heap history.
nuke example from the help page for the boot library: library(boot) data(nuclear) nuke <- nuclear[,c(1,2,5,7,8,10,11)] nuke.lm <- glm(log(cost)~date+log(cap)+ne+ ct+log(cum.n)+pt, data=nuke) nuke.diag <- glm.diag(nuke.lm) nuke.res <- nuke.diag$res*nuke.diag$sd nuke.res <- nuke.res-mean(nuke.res) nuke.data <- data.frame(nuke,resid=nuke.res,fit=fitted(nuke.lm)) new.data <- data.frame(cost=1, date=73.00, cap=886, ne=0, ct=0, cum.n=11, pt=1) new.fit <- predict(nuke.lm, new.data) nuke.fun <- function(dat, inds, i.pred, fit.pred, x.pred) { assign(".inds", inds, envir=.GlobalEnv) lm.b <- glm(fit+resid[.inds] ~date+log(cap)+ne+ct+ log(cum.n)+pt, data=dat) pred.b <- predict(lm.b,x.pred) remove(".inds", envir=.GlobalEnv) c(coef(lm.b), pred.b-(fit.pred+dat$resid[i.pred])) } for (i in 1:20) print(system.time(boot(nuke.data, nuke.fun, R=999, m=1, fit.pred=new.fit, x.pred=new.data))) The results for the first, tenth and twentieth runs on several architectures are given in the table below. We can improve locality of reference by a simple trick of sorting the free lists either on each older generation collection or on each full collection. Sorting on each full collection seems sufficient, so I have adopted that approach.
PPC PPC, bytecode AMD-K6 1st 10th 20th 1st 10th 20th 1st 10th 20th no sort 50.91 51.67 51.82 40.86 48.84 49.88 64.10 70.36 72.87 all old 51.62 50.96 50.95 39.96 39.82 39.94 62.72 61.95 61.51 full only 51.47 50.94 51.13 40.16 40.41 40.77 62.59 62.27 62.47 PPC PPC, bytecode AMD-K6 1st 10th 20th 1st 10th 20th 1st 10th 20th no sort 38.58 41.75 42.86 72.05 79.03 82.98 152.63 164.84 177.63 all old 37.21 36.86 36.97 72.16 72.23 72.48 151.28 150.63 151.02 full only 37.66 37.43 37.57 72.95 73.16 74.13 151.25 150.37 151.30
R_VSize and are R_NSize are restricted to remain below the values of R_MaxVSize and R_MaxNSize. Initially these maximal values are both set to INT_MAX to insure that the size counters will not wrap on systems that have that much memory. On other systems this means the heap will continue to grow until a malloc call fails. In many cases this may be adequate, but there are situations where paging can lead to serious system performance degradation before malloc can fail. The functions R_SetMaxVSize and R_SetMaxNSize can be used to set new maximal values; the values will be kept above the current use values. A problem with the way things currently are is that before initialization R_VSize is measured in bytes; after initialization it is measured in vector cells (typically 8 bytes). Also, the values are reported to the user in bytes. These conversions create the possibility of integer overflows and are messy. We should change this to always make the internal representation be in vector cells.
Hitting a memory limit will raise an error that aborts the current computation. With a more sophisticated error handling system it would be possible to allow the limit to be soft in the sense of raising a recoverable error that would allow a user-defined error handler to specify an increase in the limit and allow the calculation to continue.
It might be a good idea to bump the max limit a bit when it is hit to insure there is enough headroom to do any necessary cleanup. We haven't done this yet since it is a pain to do and so far it looks like enough room to delete big variables it usually available even in artificial test cases. Here is one that triggers a node cell overflow:
e<-NULL while(TRUE){ ee<-new.env() assign("x", e, env=ee) e<-ee }
R_NSize and R_VSize, as well as several other size-related variables in the collector are defined as int. This is not ideal for 64-bit architectures where pointers and longs are 64 bits but int is often 32 bits. These definitions should be changed to long (or perhaps something like size_t); I believe there are gcc warnings options that can help track down incompatibilities. There area a few places where heap adjustments use occupancy fractions expressed as double's. I think these will work fine even for 64-bit integers, but this should be reviewed.
The current upper limits of of INT_MAX on R_VSize and R_NSize are intended to prevent wrapping on systems with huge amounts of memory. They limit the number of nodes and the number of vector cells to about 2,000,000,000 each.
We should review all uses of integer variable types for appropriate size and for possible overflows.
R_NSize, R_VSize). They represent limits up to which allocation can occur without a GC. Second, there is the total amount of memory allocated by malloc. This will fluctuate more than the (R_NSize, R_VSize) pair because of efforts to release pages back to the OS. It will usually be well below the limit implied by (R_NSize, R_VSize) right after a GC. Right before before a GC it could be above due to granularity in the page allocation for small nodes but will usually be below because the GC will typically be triggered by only one of the thresholds being reached. Currently the total malloc allocation is not recorded but could be computed easily if we wanted to (and we may if we want to control total malloc allocation directly).
Third there is the actual memory footprint of R in the OS. This would be the number top or the Windows task manager should be reporting if they are accurate (the task manager may be but top I think may need to be taken with a grain of salt.) What happens here depends a lot on the quality of the malloc--whether calls to free are able to result in reducing memory footprint.
For limiting memory footprint we would ideally like to control the third measure, but this is quite difficult on Unix and Windows, or any system with real VM. Controlling malloc is a surrogate that may be adequate.
For preventing a computation from going out of control, using R_MaxNSize and R_MaxVSize should be adequate. It might even be sufficient for controlling memory footprint.
The current approach assumes that a malloc ought to succeed if it is permitted to occur by the current R_NSize and R_VSize settings. if it fails, an error is signaled immediately. With a malloc limit (and maybe even without one) it would make sense to go through a few intermediate steps after a malloc fails and before signaling an error:
malloc still fails, try to release all possible pages and try again. malloc still fails, then signal an error. As always with resource exhaustion errors, an issue is that after signaling the error the problem may still exist and it might therefore not be possible to do anything about it. A scheme of allowing a small reserve for processing the error and cleaning up may be needed.
Memory help page needs to be adjusted properly once the interface to tuning the GC is established.