NONMEM Users Network Archive

Hosted by Cognigen

Re: [Psn-general] SGE integration for psn/nonmem

From: Paul Matthias Diderichsen <pmdiderichsen>
Date: Mon, 5 Aug 2013 10:27:11 +0200

This is a partial xpost from the psn-general mailing list, which
discusses how to run NONMEM 7.2 with MPI under PsN on SGE. The topic was
started by Andreas Loong in March 2012.

I couldn't get Andreas' scripts to work with the MPICH2 included in the
NONMEM distribution, so I tweaked it until it works with no external
dependencies. Also, I simplified it a bit by putting the host definition
directly into the parafile, thus removing the need for the second
auxiliary (hosts) file. Maybe this information will save time & headache
for somebody trying to set this up.

The setup depends on three components:
1) The definition of the parallel environment
2) A helper script that initializes each instance of the parallel
environment
3) The qsub job definition

(A MPD ring needs to be available on the system!)


1) This is the parallel environment:

"
pe_name nonmem
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args /opt/nm72/start_nonmem $pe_hostfile
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task TRUE
urgency_slots min
"

2) This is the start_nonmem script for setting up the parallel environment:

"
#!/bin/bash
#
# preparation of the NONMEM parafile
# usage: start_nonmem <pe_hostfile>

GenParafile()
{
    TMPFILE=`mktemp -q XXXXXX`
    nhosts=`cat $1 | awk '{s+=$2} END {print s}'`
    echo '$GENERAL'
    echo "NODES=$nhosts PARSE_TYPE=4 PARSE_NUM=50 TIMEOUTI=60
TIMEOUT=100 PARAPRINT=0 TRANSFER_TYPE=1"
    echo ""
    echo '$COMMANDS'
    NR=1
    cat $1 | while read host cpu queue undef; do
        host=`echo $host|cut -f1 -d"."`
        for i in `seq 1 $cpu`; do
                if [[ "$NR" == 1 ]] ; then
                    echo "$NR:mpiexec -host $host -np 1 ./nonmem \$*"
                else
                echo "$NR:-host $host -np 1 ./nonmem -wf"
                fi
                # if this is the last loop, only echo the current NR.
                echo $NR > $TMPFILE
                NR=`expr $NR + 1`
        done
    done
    MAX=`cat $TMPFILE`
    echo ""
    echo '$DIRECTORIES'
    echo "1-${MAX}:NONE"
    rm -f $TMPFILE
}

# useful to control parameters passed to us
echo $*

me=`basename $0`

# test number of args
if [[ "$#" -ne 1 ]]; then
   echo "$me: got wrong number of arguments" >&2
   exit 1
fi

# get arguments
pe_hostfile=$1

# ensure pe_hostfile is readable
if [[ ! -r "$pe_hostfile" ]]; then
   echo "$me: can't read $pe_hostfile" >&2
   exit 1
fi

parafile="$TMPDIR/parafile.pnm"
GenParafile $pe_hostfile > $parafile
exit 0
"

3) This is the qsub script that launches an array job with a batch of
models:

"
#$ -S /bin/bash
#$ -V
#$ -cwd
#$ -sync yes
#$ -j y
#$ -pe nonmem XXX
#$ -t 1-YYY
. /etc/bashrc
i=$(expr $SGE_TASK_ID - 1)
sleep `echo $i | awk '{print ($1 % 40)/2}'`
models=(model1.mod model4.mod model10.mod)
execute ${models[$i]} --nm_version=nm72 --no-run_on_sge_nmfe --nmfe
--parafile=$TMPDIR/parafile.pnm
"

Substitute "XXX" with the requested number of parallel threads (for each
model!), update the "models" array, and set "YYY" to the total number of
models (here: 3). The sleep is required in order to allow PSN to create
the necessary number of distinct modelfit_dirX without clobbering.



This approach permutes the SGE-PSN-NONMEM stack in the sense
that normally, PsN is the "outer" component, calling SGE, which then
controls NONMEM. Here, SGE is the outer component, calling PsN, which
then calls NONMEM (with MPI). This leads to challenges when running e.g.
a bootstrap. (These can be worked around using SGE array jobs, but it's
not exactly seamless... Let me know if you want me to share details)

So my first question is: Did anybody ever get SGE, PsN, MPI, and
NONMEM+MPI to play seamlessly together with PsN as the outer component?



An alternative, technically simpler, approach, is to use FPI-based
parallelization. This keeps the order of components unchanged, and
depends on user-level files only (no parallel environment definition
necessary):

execute --nm_version=nm72 --parafile=fpilinux8b.pnm
--extra_files=qrshlaunch.sh modelX.mod

The parafile is based on one from the NONMEM distribution:

$ diff fpilinux8.pnm fpilinux8b.pnm
42c42
< 2-[nodes]:./beolaunch.sh worker{#-1}/ ./nonmem >worker1.out
---
> 2-[nodes]:./qrshlaunch.sh worker{#-1}/ ./nonmem >worker1.out&
47a48,51
> $CONTROL
> 2-[nodes]: MTOUCH=1 WSLEEP=5

And the launcher script is quite simple:

$ cat qrshlaunch.sh
cd $1
qrsh -V -cwd $2 $3 $4 $5 $6 $7 $8 &


This approach actually works, technically. However, for some reason, the
message passing through qrsh is extreeeemely slow on NFS drives, and it
takes ages to start workers. I realize that FPI is somewhat slower than
MPI, but this approach actually works well when restricting runs to the
master node by inserting "-l hostname=master" in the call to qrsh.

Second question: Does anybody know why message passing through qrsh on
NFS is so slow, and if anything can be done to speed things up?


Any input is highly appreciated!

(Thanks to Andreas Loong, for doing most of the work on the MPI
approach and start_nonmem script!)

Kind regards, PMD

On 3/2/2012 4:13 PM, Loong, Andreas wrote:
> I forgot, this is then used in an SGE submit script like this;
>
>
>
> #$ -S /bin/bash
>
> #$ -cwd
>
> #$ -l arch=lx24-amd64
>
> #$ -l mem_free=1G
>
> #$ -j y
>
> #$ -v PSNVERSION,PATH
>
> $ -pe pe_nonmem 20
>
>
>
> . /etc/bashrc
>
> module load psn/3.5.3 nonmem/7.2.0
>
>
>
> execute run_Cpara.mod -parafile=$TMPDIR/parafile.pnm
>
>
>
> //Andreas
>
>
>
> ------------------------------------------------------------------------
>
> *Confidentiality Notice: *This message is private and may contain
> confidential and proprietary information. If you have received this
> message in error, please notify us and remove it from your system and
> note that you must not copy, distribute or take any action in reliance
> on it. Any unauthorized use or disclosure of the contents of this
> message is not permitted and may be unlawful.
>
>
>
> *From:*Loong, Andreas [mailto:Andreas.Loong
> *Sent:* den 2 mars 2012 11:20
> *To:* General Discussion about PsN.
> *Subject:* Re: [Psn-general] SGE integration for psn/nonmem
>
>
>
> First, the PE in SGE looks like this;
>
>
>
> qconf -sp pe_nonmem
>
> pe_name pe_nonmem
>
> slots 9999
>
> user_lists NONE
>
> xuser_lists NONE
>
> start_proc_args /path/start_nonmem $pe_hostfile
>
> stop_proc_args /path/stop_nonmem
>
> allocation_rule $round_robin
>
> control_slaves TRUE
>
> job_is_first_task TRUE
>
> urgency_slots min
>
> accounting_summary TRUE
>
>
>
> The start_nonmem is responsible for creating the regular MPI host file.
> For all other MPI programs that I know of, that would have been the end
> of it. NONMEM took another approach entirely, so we had to create this PE.
>
>
>
> Iíve attached start_nonmem (not sure if thatís allowed?) so you can get
> ideas for yourself. Comments and suggestions are welcome.
>
>
>
> //Andreas
>
>
>
> ------------------------------------------------------------------------
>
> *Confidentiality Notice: *This message is private and may contain
> confidential and proprietary information. If you have received this
> message in error, please notify us and remove it from your system and
> note that you must not copy, distribute or take any action in reliance
> on it. Any unauthorized use or disclosure of the contents of this
> message is not permitted and may be unlawful.
>
>
>
> *From:*John James [mailto:jjames
> *Sent:* den 2 mars 2012 10:41
> *To:* 'psn-general
> *Subject:* Re: [Psn-general] SGE integration for psn/nonmem
>
>
>
> Hi
> We do a lot of configuration with SGE and LSF for MPI and I'd be
> interested in sharing parafile thoughts.
>
> John James
> Mango Solutions
> ---------------------------------------
> www.mango-solutions.com
>
>
>
> *From*: Loong, Andreas [mailto:Andreas.Loong
> *Sent*: Friday, March 02, 2012 08:37 AM
> *To*: psn-general
> *Subject*: [Psn-general] SGE integration for psn/nonmem
>
>
> Hi,
>
>
>
> A bit late into the game, but Iíve just created an SGE PE that creates
> the MPI host file and the NONMEM parafile dynamically for the user. Is
> anyone interested in the configuration for this?
>
>
>
> It would be the SGE configuration for a PE, the PE-start script that
> generates the parafile and a MPI hosts file. We also have a psn wrapper
> script that makes certain assumptions, which in turn affect how our
> parafile looks. It may or may not be like that for others so they may
> have to be modified a bit.
>
>
>
> Wbr
>
> Andreas
>
> ------------------------------------------------------------------------
>
> *Confidentiality Notice: *This message is private and may contain
> confidential and proprietary information. If you have received this
> message in error, please notify us and remove it from your system and
> note that you must not copy, distribute or take any action in reliance
> on it. Any unauthorized use or disclosure of the contents of this
> message is not permitted and may be unlawful.
>
>
>
> LEGAL NOTICE
> This message is intended for the use of the named recipient(s) only and
> may contain confidential and / or privileged information. If you are not
> the intended recipient, please contact the sender and delete this
> message. Any unauthorised use of the information contained in this
> message is prohibited.
>
> Mango Business Solutions Limited is registered in England under No.
> 4560258 with its registered office at Suite 3, Middlesex House,
> Rutherford Close, Stevenage, Herts, SG1 2EF, UK.
>
> PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS EMAIL
>
>
>
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
>
>
>
> _______________________________________________
> Psn-general mailing list
> Psn-general
> https://lists.sourceforge.net/lists/listinfo/psn-general
>

--
Paul Matthias Diderichsen, PhD
Quantitative Solutions B.V.
+31 624 330 706

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Psn-general mailing list
Psn-general
https://lists.sourceforge.net/lists/listinfo/psn-general
Received on Mon Aug 05 2013 - 04:27:11 EDT

The NONMEM Users Network is maintained by ICON plc. Requests to subscribe to the network should be sent to: nmusers-request@iconplc.com.

Once subscribed, you may contribute to the discussion by emailing: nmusers@globomaxnm.com.