We have recently switched our main, shared storage platform to a Lustre filesystem-based disk array and, since doing so Nastran jobs appear to have increased in time to complete and occasionally hang entirely (only to work fine on a re-run).
Reports are that the processes go into a 'wait' state of "futex_wait_queue_me"
Is this something that has any precedent ? We have also queried the hardware vendor and area awaiting response, but at this point only Nastran seems to be impacted.
Hi Neil, We have not seen or received any such issues. It would be interesting to know what vendor has to say. Do these jobs run successfully on other filesystems? Looking at the f04 file from two successful jobs may help determine where job is taking more time. You may consider opening a support request with relevant info and nastran version info etc.
Hello, Thanks for the reply. Yes they ran on the filesystem we replaced. The inconsistency is the thing that's making it tricky to track down (and lack of obvious hard errors) but it's good to know you haven't seen any specific issues with Lustre - that's at least one thing I can tick off the list. I'll see if that f04 file indicates anything and, if we get no further I will raise a support request.