UAH GRIDS Center Middleware Testing Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science and Technology Center 256-961-7806
[email protected] [email protected] www.itsc.uah.edu
“…drowning in data but starving for knowledge”
Data glut affects business, medicine, military, science
How do we leverage data to make BETTER decisions???
Information
User Community
Data Mining
• Automated discovery of patterns, anomalies from vast •
observational data sets Derived knowledge for decision making, predictions and disaster response http://datamining.itsc.uah.edu
Mining Environment: When,Where, Who and Why?
WHEN •Real Time •On-Ingest •On-Demand •Repeatedly
WHERE •User Workstation •Data Mining Center •GRID WHO •End Users •Domain Experts •Mining Experts
Data Mining
WHY •Event •Relationship •Association •Corroboration •Collaboration
Algorithm Development and Mining (ADaM)
ADaM consists of:
• a data mining engine • an extensible set of core functional applications to aid researchers in defining and performing data mining operations on spatial data sets • data mining modules as Open Grid Services Architecture (OGSA) services
ADaM Engine Architecture Results
Translated Data
Data
Preprocessed Data
Patterns/ Models
Processing Input HDF HDF-EOS GIF PIP-2 SSM/I Pathfinder SSM/I TDR SSM/I NESDIS Lvl 1B SSM/I MSFC Brightness Temp US Rain Landsat ASCII Grass Vectors (ASCII Text) Intergraph Raster Others...
Preprocessing Selection and Sampling Subsetting Subsampling Select by Value Coincidence Search Grid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find Holes Image Processing Cropping Inversion Thresholding Others...
Analysis
Output
Clustering K Means Isodata Maximum Pattern Recognition Bayes Classifier Min. Dist. Classifier Image Analysis Boundary Detection Concurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture Operations Genetic Algorithms Neural Networks Others...
GIF Images HDF-EOS HDF Raster Images HDF SDS Polygons (ASCII, DXF) SSM/I MSFC Brightness Temp TIFF Images Others...
NMI Testing ADaM Feature Subset Selection application chosen for testing Supervised pattern classification is a technique important in many domains Used to improve both the runtime and accuracy of a supervised pattern classifier by eliminating noisy, irrelevant or redundant attributes or features from the data set. Feature subset selection is the process of choosing a subset of the features from the original data set in order to maximize classifier accuracy Both processor and data-intensive
Parallel Version of Cloud Extraction • GOES images can be used to recognize cumulus cloud fields • Cumulus clouds are small and do not show up well in 4km resolution IR channels • Detection of cumulus cloud fields in GOES can be accomplished by using texture features or edge detectors
GOES Image
Energy Computation
Laplacian Filter
Sobel Horizontal Filter
Sobel Vertical Filter
Energy Computation
Energy Computation
Energy Computation
Classifier
Cloud Image
GOES Image Cumulus Cloud Mask
• Three edge detection filters are used together to detect cumulus clouds which lends itself to implementation on a parallel cluster
Feature Subset Selection Application • Application ported to • • •
• • •
linux Support Vector Machine downloaded and tested Developed application scripts Modified for Globus environment by writing simple Globus RSL file Ran each combination of tools on a different node on the grid Globus used to execute jobs on different machines Experimented with both real and synthetic data
Satellite Data
Grid Mining Agent
Archive X Grid Processor Grid Mining Agent
Grid Processor Satellite Data
Grid Mining Agent
Archive Y
Grid Processor
Components used in testing Globus toolkit - the “defacto standard,” an open source software toolkit and libraries for building grid applications; Resource Management, scheduling, information services, file transfer
GSI- OpenSSH - a modified version of OpenSSH that adds support for GSI authentication, providing a single sign-on remote login capability for the Grid
Condor-G - workload management system for compute-intensive jobs; job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management.
Network Weather Service - monitors and dynamically forecasts the performance various network and computational resources can deliver over a given time interval
Some Lessons Learned • Component testing went well Globus documentation improved, installation trouble-free, application port straight-forward No problems encountered during Condor-G installation, but found problem with Condor-G under Redhat linux 7.3 when using nss_ldap. Developer provided workaround - start name service caching daemon (nscd) GSI-OpenSSH installed, but Kerberos authentication did not work since linux was not compiled with PAM option (undocumented) Network Weather Service installed, but learned we are more interested in MDS
Some Lessons Learned • NMI Testbed Process working well
• •
•
Answers found through NMI discussion lists from developers and other users Have to “sell” the grid concept to developers, administrators, users NMI Work proven helpful in other grid work TeraGrid ISS Space-based Science Operations Grid CEOS Grid Need more components!