Speeding up H5 file access for shell force results?

I'm trying to use Python to access Nastran quad element shell forces via the H5 file. I've had good luck with processing SPC, MPC, CBUSH, and BAR forces, but when trying to extract shell forces from a large model my Python script is taking a SUPER long time. For example, I have an H5 file that contains results for 93,407 Quad4 elements for 28 subcases. The snippet of code below is what I'm using:

keys = ['MX','MY','MXY','BMX','BMY','BMXY','TX','TY']

data = tables.open_file("bigmodel.h5")

quad = data.root.NASTRAN.RESULT.ELEMENTAL.ELEMENT_FORCE.QUAD4_CN

out = open("outfile.csv,"w") # - open the CSV file to write to

for i in quad_ids:

out.write("%d"%i)

for case in subcases:

print "case = %s"%case['subtitle']

domid = case['domid']

for a in quad.where("(DOMAIN_ID == %d) & (%s == %d)"%(domid,keyid,i)):

for b in keys:

out.write(",%f" % (a[b][0]))

out.write("\n")

out.close()

data.close()

The quad_ids[] list contains all the quad element ids, and the subcases{} is a dictionary that contains the subcase title and domain id.

I ran this code on the H5 file and it ran for 6 hours straight before I finally killed it. It was working ok, but it ran super, super slow. The total number of records in the QUAD4_CN table is 94,407 x 28 = 2,615,396.

I'm a novice at using PyTables, so I tried my best to optimize the speed using the in-line query (which supposedly uses the C-compiled search). Has anyone else experienced this slowness and can offer any advice on speeding up my code?

Thanks.

Parents

0 Ken Walker over 5 years ago

Dave, The bottleneck in your code is the .where() query. It's a good way to get the data, but is an expensive operation. I ran some timing tests, and it's about 3.5 sec/query on my laptop. Time is somewhat independent of the search condition. In other words, time to search for "EID==1000" is about the same as a search for "DOMAIN_ID==10" or "(DOMAIN_ID==10) & (EID==1000)". Your process needs 2,615,396 searches (94,407 x 28), so this will take "forever". The key to improve performance is minimizing the number of .where() searches. You can do this by reading your .where() searches into a NumPy array. It would look something like this (to get all results for 1 domain):

read_arr = quad.read_where("(DOMAIN_ID == %d)"%(domid))

# read_arr is a NumPy record array

This takes less than 120 seconds to read all 28 domain ids. Once you have read_arr, you will have to rearrange for output to csv. That depends on your Python skills. I have some ideas I can share.

As an aside, I would be surprised if Pandas is faster than PyTables. Pandas sits on top of PyTables in the Python stack.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Ken Walker over 5 years ago

Dave, The bottleneck in your code is the .where() query. It's a good way to get the data, but is an expensive operation. I ran some timing tests, and it's about 3.5 sec/query on my laptop. Time is somewhat independent of the search condition. In other words, time to search for "EID==1000" is about the same as a search for "DOMAIN_ID==10" or "(DOMAIN_ID==10) & (EID==1000)". Your process needs 2,615,396 searches (94,407 x 28), so this will take "forever". The key to improve performance is minimizing the number of .where() searches. You can do this by reading your .where() searches into a NumPy array. It would look something like this (to get all results for 1 domain):

read_arr = quad.read_where("(DOMAIN_ID == %d)"%(domid))

# read_arr is a NumPy record array

This takes less than 120 seconds to read all 28 domain ids. Once you have read_arr, you will have to rearrange for output to csv. That depends on your Python skills. I have some ideas I can share.

As an aside, I would be surprised if Pandas is faster than PyTables. Pandas sits on top of PyTables in the Python stack.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

No Data