Competing in HPC, “Big Data” and Visualization
One of my challenges in coming to Missouri S&T has been to leverage the most effective use of our meager High Performance Computing, HPC, capabilities to stimulate learning and non-funded research. This has been an ideal opportunity for myself to evaluate this rapidly evolving area of HPC with no predetermined assumptions. Some early observations were that we did not have adequate super computing resources, but it was also apparent that those with enough resources did not necessarily produce proportional results. What we did have was an understanding of what we would do if we had more resources. If I just focused on HPC I would find myself in a resource battle trying to gain recognition in the research community based on cores and compute capability. But we were also interested in visualization and then along came interest in “Big Data”. What I saw was an opportunity.
The one thing I did have was the foundation of an effective research support team which included skill in adapting HPC technique to fit differences in data and workflow requirements. I also had talented student employees who totally thought outside the box and exposed many new options for us. So we started to see that we could compete in processing by adapting our HPC resources to the jobs being requested. And it became increasingly apparent that we were dealing with data that benefited from some sort of visualization to help identify what we should be looking for. For example: we have gotten good at presenting large data sets graphically over time with flexible data attribute selection where we are just looking for anomalies. Now that we are also exploring “Big Data” I could not help but ask why the concept of large in-memory processing for hadoop based data could not be married with traditional HPC and supported by our flexible visualization.
It now appears that my first year of exploration is starting to take shape. I have strengthened my human resources and have discovered that the human element is the most scarce, or at least a flexible human resource team such as we have. So now I have some financial resources to invest and this understanding of the interrelationships of these research tools is helping to stretch what I hope to accomplish. Most of our HPC cluster is devoted to students so we need a base HPC investment devoted to non funded research. For us that goal is probably a 1000 cores. But our success is not going to come from those 1000 cores, but instead from the collaborations we have developed with neighboring university computing centers who realize that we have more to share then just HPC. We can help them optimize their 1000’s of cores specific to the computation desired. Good example here is in computational chemistry.
I mentioned exploring “Big Data”, which has become the darling of big iron computer sales. In simplest terms, “Big Data” is about managing large diverse data sets and processing it with large amounts of memory. The real driver of “Big Data” is the need to analyze the massive amounts of real-time data flowing in about customer buying habits. But of course we have been led to believe that all of our analytical investigations should be using “Big Data”. Not true for analyzing student data but can be true for analyzing some forms of scientific data. And guess what “Big Data” really means it is too big to visualize with traditional spreadsheet type tools. So I am thinking why can’t we blend HPC and “Big Data” with my new nimble visualization techniques? We have all the ingredients and the most important turns out to be the human factor. So now I am throwing some DBA’s into the equation along with scientific software engineers with plans to expand the visualization resources. We should be able to provide most of our processing needs locally or via sharing with regional partners. Add in efficient on-ramps to XSEDE and Open Science Grid and we can compete with anyone.