hexagon logo

Speeding up H5 file access for shell force results?

I'm trying to use Python to access Nastran quad element shell forces via the H5 file. I've had good luck with processing SPC, MPC, CBUSH, and BAR forces, but when trying to extract shell forces from a large model my Python script is taking a SUPER long time. For example, I have an H5 file that contains results for 93,407 Quad4 elements for 28 subcases. The snippet of code below is what I'm using:
 
keys = ['MX','MY','MXY','BMX','BMY','BMXY','TX','TY']
 
data = tables.open_file("bigmodel.h5")
 quad = data.root.NASTRAN.RESULT.ELEMENTAL.ELEMENT_FORCE.QUAD4_CN
 
out = open("outfile.csv,"w") # - open the CSV file to write to
 
for i in quad_ids: 
   out.write("%d"%i) 
   for case in subcases: 
       print "case = %s"%case['subtitle']          
       domid = case['domid'] 
       for a in quad.where("(DOMAIN_ID == %d) & (%s == %d)"%(domid,keyid,i)):
           for b in keys:                                       
               out.write(",%f" % (a[b][0]))
 
   out.write("\n")
 
out.close()
data.close()
 
The quad_ids[] list contains all the quad element ids, and the subcases{} is a dictionary that contains the subcase title and domain id.
 
I ran this code on the H5 file and it ran for 6 hours straight before I finally killed it. It was working ok, but it ran super, super slow. The total number of records in the QUAD4_CN table is 94,407 x 28 = 2,615,396.
 
I'm a novice at using PyTables, so I tried my best to optimize the speed using the in-line query (which supposedly uses the C-compiled search). Has anyone else experienced this slowness and can offer any advice on speeding up my code?
 
Thanks.
  • Hi David,
     
    You have three for loops and at two locations it is writting the data to csv file. This can be time consuming.
    Rather try appending the data to tuple and write at the end to csv.
     
    Also you can try using pandas to read the data from H5, which should be faster compared to Pytables. Which may not even need for loops.
     
    Regards,
    Vivek
     
     
  • Thanks for the suggestion, Vivek. Using Pandas makes a lot of sense to get better performance. I've never used Pandas before-- do you have a simple example you can share that shows how to access the shell forces from an h5 file?
     
    Thanks.
  • Ok, I've been investigating Pandas and here's what I got so far:
     
    I tried using the following to read the h5 file in a Python script,
     
    import pandas as pd
    data = pd.read_hdf("big_model.h5",'/NASTRAN/RESULT/ELEMENTAL/ELEMENT_FORCE/QUAD4_CN')
     
    >> ValueError: Wrong number of items passed 5, placement implies 1
     
    The ValueError message is appearing because several columns of data in the h5 file contain not a single value but a list object containing 5 values (e.g., [4,1,2,180,179] for the 'GRID' column which has the number of nodes and the 4 node ids for this quad element). The results file contains these lists because the Nastran output request was for centroid and element corner node forces (5 data values) for each element.
    a1
    One solution would be to only request centroid forces in the Nastran bdf file which would write out only one value for the columns in the h5 file. However, I'd like to find a way to get this to work with the corner forces written to the h5 file.
     
    Any suggestions or advice is appreciated.
     
    Thanks.
  • Dave, The bottleneck in your code is the .where() query. It's a good way to get the data, but is an expensive operation. I ran some timing tests, and it's about 3.5 sec/query on my laptop. Time is somewhat independent of the search condition. In other words, time to search for "EID==1000" is about the same as a search for "DOMAIN_ID==10" or "(DOMAIN_ID==10) & (EID==1000)". Your process needs 2,615,396 searches (94,407 x 28), so this will take "forever". The key to improve performance is minimizing the number of .where() searches. You can do this by reading your .where() searches into a NumPy array. It would look something like this (to get all results for 1 domain):
    read_arr = quad.read_where("(DOMAIN_ID == %d)"%(domid)) 
    # read_arr is a NumPy record array
    This takes less than 120 seconds to read all 28 domain ids. Once you have read_arr, you will have to rearrange for output to csv. That depends on your Python skills. I have some ideas I can share.
     
    As an aside, I would be surprised if Pandas is faster than PyTables. Pandas sits on top of PyTables in the Python stack.
  • BTW, your force results are a NumPy array (not a list). This occurs when you request results at multiple locations (centroid + grids). You will see this in other places: element stresses and strains are examples. Note: PyTables and h5py can work with an array of arrays (like your data structure). You can see how this works if you deconstruct this statement: out.write(",%f" % (a[b][0]))
    a is the object returned from the .where() query (in your case it's one row of data)
    a[b] is the 'b' field/column of data (eg the fields for MX, MY, etc)
    a[b][0] is the first element in that field, in this case at the element centroid. a[b][1] would be force at the first grid, etc.