I'm trying to use Python to access Nastran quad element shell forces via the H5 file. I've had good luck with processing SPC, MPC, CBUSH, and BAR forces, but when trying to extract shell forces from a large model my Python script is taking a SUPER long time. For example, I have an H5 file that contains results for 93,407 Quad4 elements for 28 subcases. The snippet of code below is what I'm using:
out = open("outfile.csv,"w") # - open the CSV file to write to
for i in quad_ids:
out.write("%d"%i)
for case in subcases:
print "case = %s"%case['subtitle']
domid = case['domid']
for a in quad.where("(DOMAIN_ID == %d) & (%s == %d)"%(domid,keyid,i)):
for b in keys:
out.write(",%f" % (a[b][0]))
out.write("\n")
out.close()
data.close()
The quad_ids[] list contains all the quad element ids, and the subcases{} is a dictionary that contains the subcase title and domain id.
I ran this code on the H5 file and it ran for 6 hours straight before I finally killed it. It was working ok, but it ran super, super slow. The total number of records in the QUAD4_CN table is 94,407 x 28 = 2,615,396.
I'm a novice at using PyTables, so I tried my best to optimize the speed using the in-line query (which supposedly uses the C-compiled search). Has anyone else experienced this slowness and can offer any advice on speeding up my code?
Dave, The bottleneck in your code is the .where() query. It's a good way to get the data, but is an expensive operation. I ran some timing tests, and it's about 3.5 sec/query on my laptop. Time is somewhat independent of the search condition. In other words, time to search for "EID==1000" is about the same as a search for "DOMAIN_ID==10" or "(DOMAIN_ID==10) & (EID==1000)". Your process needs 2,615,396 searches (94,407 x 28), so this will take "forever". The key to improve performance is minimizing the number of .where() searches. You can do this by reading your .where() searches into a NumPy array. It would look something like this (to get all results for 1 domain):
This takes less than 120 seconds to read all 28 domain ids. Once you have read_arr, you will have to rearrange for output to csv. That depends on your Python skills. I have some ideas I can share.
As an aside, I would be surprised if Pandas is faster than PyTables. Pandas sits on top of PyTables in the Python stack.
Dave, The bottleneck in your code is the .where() query. It's a good way to get the data, but is an expensive operation. I ran some timing tests, and it's about 3.5 sec/query on my laptop. Time is somewhat independent of the search condition. In other words, time to search for "EID==1000" is about the same as a search for "DOMAIN_ID==10" or "(DOMAIN_ID==10) & (EID==1000)". Your process needs 2,615,396 searches (94,407 x 28), so this will take "forever". The key to improve performance is minimizing the number of .where() searches. You can do this by reading your .where() searches into a NumPy array. It would look something like this (to get all results for 1 domain):
This takes less than 120 seconds to read all 28 domain ids. Once you have read_arr, you will have to rearrange for output to csv. That depends on your Python skills. I have some ideas I can share.
As an aside, I would be surprised if Pandas is faster than PyTables. Pandas sits on top of PyTables in the Python stack.