Questions tagged [gridengine]
Grid Engine is a distributed resource management (DRM) system that manages the distribution of users' workloads to available compute resources.
74 questions
3 votes
1 answer
285 views
sge: How to set hard limits of node?
The guy who administrates our cluster died suddenly and recently, so now we have to operate it ourselves until a new guy comes in. We want to change the hard runtime limit of a node on our cluster. ...
1 vote
0 answers
29 views
How to limit the master host using 1 slot for each job on gridengine?
I run the gridengine on my cluster and like to have a master queue where each job can only have 1 slot on the master host. but i do not find the way to configure this. I learned that resource limits ...
1 vote
0 answers
74 views
View/request instruction sets available on SGE host
How can I view or request hosts that can handle a particular instruction set in SGE? With Slurm, to view available instruction sets on each host I can use sinfo --Node -o '%n %f', and to submit a ...
0 votes
1 answer
193 views
How to create a SGE queue which can only be assigned jobs manually?
I have 5 nodes in an SGE cluster. I'd like to make it so one of those nodes can only be used by a specific queue, "test.q". I can remove that node from hostlist on all of my other queues, and set ...
0 votes
1 answer
140 views
How to allow concurrent execution of job arrays with job share (-js)
I'm trying to have SGE run job array tasks concurrently based on the job shares parameter of qsub but it seems not to be working as expected. Is there a way to enable concurrent task execution based ...
0 votes
1 answer
82 views
How to make grid master "accept" gone hosts?
With gridengine-master 6.2u5-7.3 (Ubuntu Trusty), our /var/lib/gridengine/spool/qmaster/messages gets constantly filled with: 12/07/2016 04:11:43|worker|tools-grid-master|E|got load report of unknown ...
4 votes
2 answers
10k views
What is the difference between h_rss and h_vmem in Sun Grid Engine (SGE)?
So far as I understood, mem_free can be specified to submit a job in a host that has the memory free = mem_free, whereas h_vmem is the hard limit of the memory up to which the job can consume and ...
0 votes
1 answer
108 views
SGE/OGS 2011 breaking Ansys Workbench Mechanical launch due to network port blocked
We are running a SGE / OGS compute cluster on EL6 and are trying to launch an interactive Ansys Workbench. This works using SSH with X Forwarding, but using qrsh fails to run the Mechanical component -...
1 vote
1 answer
1k views
Trying to get qsub to work on my cluster
Trying to get qsub to work on my cluster (single node right now but more are coming) So far trying to submit with qsub was returning error: commlib error: got select error (Connection refused) ...
1 vote
1 answer
1k views
How to get statistics about pending jobs in a specific SGE queue (e.g. using qacct)?
I know I can use qacct to get all kind of statistics about running jobs. Now is there a way get some statistics about how many jobs are pending and how long do they need to wait on average? Bonus ...
0 votes
1 answer
8k views
SGE Grid Engine error "qsub: Unknown option"
I submit a job to SGE using the qsub command and get the error: qsub: Unknown option What unknown option?
0 votes
1 answer
418 views
How to submit job from compute node to another compute node?
Currently I am working on a SGE cluster, and I could submit jobs using qsub in the head node, but what I want to do now is to create new jobs and submit jobs from one compute node to another, is it ...
0 votes
1 answer
515 views
Reserving slots in an SGE parallel environment
I'm trying to set up a Parallel Environment in SGE, with allocation rule $pe_slots, but I'm having trouble with the scheduler. Single-core jobs hog the slots; there are never enough slots open at one ...
2 votes
1 answer
2k views
Set up SGE to Fill Each Node Completely Rather than Distribute Jobs
Originally posted on Stack Overflow by mistake... See PS at bottom for response from that post. I've search for this a while, but cannot find the answer. The problem I have is this: assume I have a ...
-1 votes
1 answer
968 views
How can I fix SGE jobs stuck in queue
We are using Son of Grid Engine 8.1.8 on a grid of Debian 7.8 machines installed from deb packages. This has been working well until today, when a user submitted a processing stream, but all parts end ...
0 votes
1 answer
2k views
qsub is working but qrsh fails and only when resources are specified explicitly with -l. Why?
I am getting the following error while submitting a simple interactive job to open a shell: qrsh -V -cwd -verbose -q nsnel6.q -l h_vmem=12.000G tcsh local configuration arslox51 not defined - using ...
2 votes
1 answer
3k views
Why is there concept of slots in SGE?
According to SGE 5.3 Manual, Slots - The number of jobs which may be executed concurrently in that queue I am new to these concepts and want to start by understanding one by one. For suppose, if ...
1 vote
0 answers
172 views
SGE qsub concatenate different requests for different hosts
I wonder it is possible to concatenate multiple requests for different hosts in SGE qsub command? For example, I tried this: qsub -l h="compute-0-[0-9]" -pe smp 6 -l h="compute-0-2[0-9]" -pe smp 4 ...
0 votes
1 answer
523 views
How to set up SGE for following scheduling: "try first to run in queue A, if no A-slots available, try to run in queue B"?
Assume you have two queues, queue A with some new hardware, and queue B with old hardware. Further, both queues have the same number of nodes and slots for SGE jobs, e.g. 10 slots per queue. Now I ...
0 votes
1 answer
215 views
Sun Grid Engine Dynamic resource allocation
My cluster is running Sun's Grid Engine version "GE 6.2u5 $Date: 2009/12/01 12:24:06 $". I'd like to submit a single job to the queue which is defined by a bash script containing a number of commands. ...
1 vote
1 answer
181 views
OGE no value for load_avg
There is a problem with my OGE configuration. The load_avg for the nodes does not get set (remains at -NA-). Because of this and because of the np_load_avg threshold on the queue no jobs are being run....
1 vote
1 answer
2k views
Setting enviromental variables for login shells under root account in FreeBSD
I am currently in the process of configuring Open Grid Scheduler in FreeBSD. As part of this process I need to set the environmental variableSGE_ROOTas root. To do this I have been experimenting with ...
0 votes
2 answers
2k views
High Low priority queues SGE
I would like to have two queues in a CPU/GPU cluster, one with high priority and one with low priority. Thus, jobs that are submitted in the high priority queue will be bumped to the top of the list ...
1 vote
1 answer
2k views
How to get your qsub script information that you submitted before? [closed]
Since qstat only shows limited information, see the following as an example. But I want to know the details of a job AAA (say it was submitted by qsub sample, this sample script I guess must be stored ...
4 votes
1 answer
369 views
What does the qstat output jclass mean?
What does the qstat output jclass mean? $ qstat -help UGE 8.1.4 $ qstat -u myusername job-ID prior name user state submit/start at queue jclass ...
2 votes
0 answers
209 views
Programmatically add EC2 execute nodes to Grid Engine cluster
I am running Grid Scheduler (fka Sun Grid Engine) on Amazon Web Services. Master node is running all the time, but I want to programmatically add nodes to the cluster (also remove - but remove is not ...
1 vote
1 answer
3k views
How do I recover a Berkeley DB (included in a Sun Grid Engine installation)?
I'm on CentOS 5. [root@newjanux spooldb]# uname -a Linux newjanux 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux and SGE 6.2u2 I already have copies of the original ...
0 votes
1 answer
136 views
Sun Grid Engine : jobs are not well balanced
I use Open Grid Scheduler (a fork/copy of Sun Grid Engine). I have tried this configuration from master: # qconf -mattr exechost complex_values slots=8 slave2 # qconf -mq all.q | grep slots slots ...
0 votes
1 answer
845 views
How do I impose a slot limit for sge on my user without manager rights?
On our cluster I could use up all the slots with the current configuration of sge. I don't want to overuse my privileges by accident / check at every submit. Is there a way to impose a hard limit on ...
8 votes
1 answer
16k views
Track memory usage of a job on SGE
I'm looking for some guidance in how to precisely figure out how much RAM my job is using on my cluster. My job is not multi-threaded and runs on a single cpu. When I run my job and run "top" I can ...
13 votes
2 answers
598 views
Sun Grid Engine huhohshdhjha
when I type qstat -h, I get the following option [-s {p|r|s|z|hu|ho|hs|hd|hj|ha|h|a}] show pending, running, suspended, zombie jobs, jobs with a user/operator/...
1 vote
1 answer
467 views
qdel deletes all my jobs
I'm using Sun GridEngine (Rocks Cluster) on a server to run remote jobs. When I try to remove jobs with qdel, it often works as expected, but every now and then it just deletes almost everything it ...
1 vote
1 answer
168 views
SGE: downtime planning
I need to plan a downtime for a maintenance of my environment (or some part of my environment) by means of Sun Grid Engine. Is it possible to somehow use backfilling information to tell the grid ...
0 votes
1 answer
1k views
SGE Auto configured consumable resource?
I am using a tool called starcluster http://star.mit.edu/cluster to boot up an SGE configured cluster in the amazon cloud. The problem is that it doesn't seem to be configured with any pre-set ...
1 vote
0 answers
97 views
systemimager and sun grid engine queuing system
I'm about to install our new cluster. I've installed the first node and used it for golden-image. As a queuing software we use SGE (Sun Grid Engine). After installing of the first node I tested ...
1 vote
1 answer
150 views
Grid Engine Resource Requirement
Does anyone know about setting a requirement to use a particular cluster node. I have a server with 128G of RAM that I'd like to sit idle until a user specifically requests something like -l h_vmem=...
0 votes
1 answer
245 views
Sun Grid Engine VNC temporary sessions
Since I've been in a world of hurt with FreeNX attempting to get shadowing to work, I stumbled across a brief description describing starting vnc sessions through an SGE job (our firewall rules would ...
2 votes
1 answer
646 views
Prevent users running processes on cluster head node
What ways are there to prevent users from starting long running, resource intensive processes on the headnode of a Rocks cluster? I've tried: asking politely setting the nice level in limits.conf to ...
4 votes
2 answers
153 views
Are there cluster resource schedulers abstraction layers?
I'm writing an application that could potentially be run on any cluster resource scheduler (SGE, LSF or SLURM to name a few of them), using very basic functionalities. I'm wondering if a framework/...
2 votes
1 answer
163 views
qsub: How can I find out what DRM middleware exactly is installed on a cluster?
I have a user account on a very big cluster. I have previous experience with Grid Engine and want to use the cluster for array jobs. The documentation tells me to use "qsub" for load balancing / ...
1 vote
2 answers
1k views
Using CUDA_VISIBLE_DEVICES with sge
Using sge with resource complex called 'gpu.q' that allows resource management of gpu devices (these are all nvidia devices). However on the systems there are multiple gpu devices (in exclusive mode) ...
2 votes
1 answer
1k views
Using ionice Over Cluster
Background: I use a computing cluster at work (4 slave nodes and 1 head node) that uses the SGE job scheduler. Recently we've been running jobs that do some heavy IO and it has been slowing down ...
1 vote
1 answer
252 views
OpenMPI in SGE fails when not observed
I know the topic is weird but so is my problem. On our cluster we have SGE with OpenMPI compiled for tight integration. When I set it up it worked just fine in my tests and so far there have been no ...
1 vote
1 answer
543 views
How to limit jobs in an indicated queue in SGE
I have two queues in SGE for different purposes. Each of them has a limit in slots. What I want is to have only a certain number of jobs submitted to a queue waiting even when the other queue is idle. ...
1 vote
2 answers
379 views
Sun Grid Engine (SGE) Jobs Not Visible After Adding virtual_free
I'm trying to to use virtual_free to limit the number of large memory jobs running each grid node in my cluster. This seems to be working as expected. After I modified my code to submit jobs with the ...
2 votes
1 answer
774 views
Is there a way to tell SGE to run specific jobs as root on the execution node?
The title kinda says it all... We're using SGE/OGE to submit jobs to a set of worker nodes that then do things with specific pieces of equipment. The programs and scripts that have been created that ...
0 votes
1 answer
161 views
Can I add info to SGE's internal accounting AFTER job submittal?
I'm writing a front-end script to enable users to simply submit and query jobs to a gridengine cluster. Specifically, we want to be able to display via this script info about all of the queues ...
5 votes
4 answers
12k views
Howto set up SGE for CUDA devices?
I'm currently facing the problem of integrating GPU-Servers into an existing SGE environment. Using google I found some examples of Clusters where this has been set up but no information on how this ...
3 votes
1 answer
187 views
Is it a bad idea to add lots of machines as submit hosts in an SGE environment?
We're replacing a home-grown queueing system with SGE/OGE. The current work environment has engineers using their own local Linux workstation to submit jobs. So I'm wondering about adding many ...
3 votes
1 answer
334 views
asynchronous job queueing in sun grid engine (SGE) - possible?
We are looking to deploy a queueing system, and SGE is looking like it will meet nearly all of our wishes. However, we had the idea of supporting both a synchronous and asynchronous queueing model. ...