NONMEM Users Network Archive

Hosted by Cognigen

Re: Linear speedup of NONMEM on quad-core CPUs?

From: Steve Chapel <steven.chapel>
Date: Wed, 14 Mar 2007 10:30:10 -0400

I didn't even think about disk I/O. I was more concerned about front
side bus activity being a bottleneck. I found that NONMEM used only 2.4
MB of memory, but of course this would depend on computer architecture,
compiler options, sizes of arrays, and so on. My impression is that the
bus should not be a bottleneck, because the 4 MB shared cache of each
dual core should be able to hold the most frequently accessed memory, as
you point out. It's good that someone has determined that this really is
the case.

As for why go the quad-core route, I'm going to need a very reliable
system so I want servers with Xeons anyway. Noise is not a concern,
because these servers are going to get their own room. Comparing the
cost and power of two quad-core systems vs. a system with two
quad-cores, it looks like the two quad-core system will cost less and
should use less power. If performance is no worse, it makes economic
sense to pack as many cores into each box as possible.

-- Steve


Mark Sale - Next Level Solutions wrote:
> Steve,
> It really, really should be the case that speed up for multiple
> simulataneous runs is linear. In looking at it for many years, NONMEM
> execution really is consistently proportional to benchmarks like
> specfp95. It seems that disc I/O is trivial, the entire data set can
> typically be put into cache on modern machines. I have noted
> differences between "cheaper" 2.8 Ghz dual core machines (Dell E510)
> and "better" 2.8 Ghz machines I've gotten (from Gateway). But, if you
> look at the specfp95 (http://www.spec.org/cpu95/results/cfp95.html),
> there are difference between machines using the same CPU - I can't
> claim to understand why. Memory should not be an issue - NONMEM
> typically uses less than 5 Mb of memory.
>
> I have done what you ask (I think) in a two stage, but not the whole
> thing:
> Dual core does increase run speed (1/time) linearly (note that dual core
> are typically a little slower clock speed) for 2 processes - this is
> what I currently run.
> 4 processor (single core - a Proliant 4 processor server running Windows
> Server 2000) machine does increase run speed (1/time) linearly, for four
> processes.
>
> The Intel quad core is just two dual core processor stuck together with
> a single front side bus, they don't share cache or registers. This
> probably is better for NONMEM than the AMD approach, sharing registers,
> since separate NONMEM runs obviously don't need to share anything. (the
> Intel approach is worse for games, since latency to cache memory is
> worse)
>
> But, a 4 processor dual core will cost you > $12,000, and will not use
> less power than 4 dual core boxes - why go to the quad processor?
> (Trust me, it won't make less noise either) You can buy 4 dual core
> boxes, set up a LAN and map the c: drive on one "main" machine to all
> the machines (so from the "main" machine, everthing looks like it is
> happening on the local drive, when in fact execution is happening on
> the other machines), use remote desktop to control all 4 computers from
> one monitor/mouse/keyboard. A dual core Dell is about $700. Best price
> for quad core right now is about $2000 (i.e, more $/Ghz than dual core)
> The current Intel quad core is intended for servers, and is expensive.
> The desktop version is due out late this year - should be cheaper and
> prices will probably come down when AMD comes out with their quad core
> CPU.
>
> Brian,
> You're observation (if I understand correctly that you are talking
> about running only one NONMEM run) is a little surprising, NONMEM is
> single threaded. So the current appoach to parallel computing
> (multithreading) isn't going to happen. The parallel option on the
> Intel compiler can, in theory, "unroll" loops in Fortran. But, in
> reality, the code has to be specifically written to do this, and NONMEM
> certainly is not. I tried this, in collaboration with Silicon Graphics
> about 10 years ago (who claimed to have the best parallel compiler
> around, right before they went out of business), and got zero
> parallelization for a single run of NONMEM. But this was a long time
> ago, maybe Intel figured out something new.
>
>
> Mark
>
> Mark Sale MD
> Next Level Solutions, LLC
> www.NextLevelSolns.com
>
Received on Wed Mar 14 2007 - 10:30:10 EDT

The NONMEM Users Network is maintained by ICON plc. Requests to subscribe to the network should be sent to: nmusers-request@iconplc.com.

Once subscribed, you may contribute to the discussion by emailing: nmusers@globomaxnm.com.