GSD Realtime Systems Product Strategy or “What the TReK is going

March 20, 2018 | Author: Anonymous | Category: N/A
Share Embed


Short Description

Download GSD Realtime Systems Product Strategy or “What the TReK is going...

Description

UAH GRIDS Center Middleware Testing Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science and Technology Center 256-961-7806 [email protected] [email protected] www.itsc.uah.edu

“…drowning in data but starving for knowledge”

Data glut affects business, medicine, military, science

How do we leverage data to make BETTER decisions???

Information

User Community

Data Mining

• Automated discovery of patterns, anomalies from vast •

observational data sets Derived knowledge for decision making, predictions and disaster response http://datamining.itsc.uah.edu

Mining Environment: When,Where, Who and Why?

WHEN •Real Time •On-Ingest •On-Demand •Repeatedly

WHERE •User Workstation •Data Mining Center •GRID WHO •End Users •Domain Experts •Mining Experts

Data Mining

WHY •Event •Relationship •Association •Corroboration •Collaboration

Algorithm Development and Mining (ADaM)

ADaM consists of:

• a data mining engine • an extensible set of core functional applications to aid researchers in defining and performing data mining operations on spatial data sets • data mining modules as Open Grid Services Architecture (OGSA) services

ADaM Engine Architecture Results

Translated Data

Data

Preprocessed Data

Patterns/ Models

Processing Input HDF HDF-EOS GIF PIP-2 SSM/I Pathfinder SSM/I TDR SSM/I NESDIS Lvl 1B SSM/I MSFC Brightness Temp US Rain Landsat ASCII Grass Vectors (ASCII Text) Intergraph Raster Others...

Preprocessing Selection and Sampling Subsetting Subsampling Select by Value Coincidence Search Grid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find Holes Image Processing Cropping Inversion Thresholding Others...

Analysis

Output

Clustering K Means Isodata Maximum Pattern Recognition Bayes Classifier Min. Dist. Classifier Image Analysis Boundary Detection Concurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture Operations Genetic Algorithms Neural Networks Others...

GIF Images HDF-EOS HDF Raster Images HDF SDS Polygons (ASCII, DXF) SSM/I MSFC Brightness Temp TIFF Images Others...

NMI Testing ADaM Feature Subset Selection application chosen for testing  Supervised pattern classification is a technique important in many domains  Used to improve both the runtime and accuracy of a supervised pattern classifier by eliminating noisy, irrelevant or redundant attributes or features from the data set.  Feature subset selection is the process of choosing a subset of the features from the original data set in order to maximize classifier accuracy  Both processor and data-intensive

Parallel Version of Cloud Extraction • GOES images can be used to recognize cumulus cloud fields • Cumulus clouds are small and do not show up well in 4km resolution IR channels • Detection of cumulus cloud fields in GOES can be accomplished by using texture features or edge detectors

GOES Image

Energy Computation

Laplacian Filter

Sobel Horizontal Filter

Sobel Vertical Filter

Energy Computation

Energy Computation

Energy Computation

Classifier

Cloud Image

GOES Image Cumulus Cloud Mask

• Three edge detection filters are used together to detect cumulus clouds which lends itself to implementation on a parallel cluster

Feature Subset Selection Application • Application ported to • • •

• • •

linux Support Vector Machine downloaded and tested Developed application scripts Modified for Globus environment by writing simple Globus RSL file Ran each combination of tools on a different node on the grid Globus used to execute jobs on different machines Experimented with both real and synthetic data

Satellite Data

Grid Mining Agent

Archive X Grid Processor Grid Mining Agent

Grid Processor Satellite Data

Grid Mining Agent

Archive Y

Grid Processor

Components used in testing  Globus toolkit - the “defacto standard,” an open source software toolkit and libraries for building grid applications; Resource Management, scheduling, information services, file transfer

 GSI- OpenSSH - a modified version of OpenSSH that adds support for GSI authentication, providing a single sign-on remote login capability for the Grid

 Condor-G - workload management system for compute-intensive jobs; job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management.

 Network Weather Service - monitors and dynamically forecasts the performance various network and computational resources can deliver over a given time interval

Some Lessons Learned • Component testing went well  Globus documentation improved, installation trouble-free, application port straight-forward  No problems encountered during Condor-G installation, but found problem with Condor-G under Redhat linux 7.3 when using nss_ldap. Developer provided workaround - start name service caching daemon (nscd)  GSI-OpenSSH installed, but Kerberos authentication did not work since linux was not compiled with PAM option (undocumented)  Network Weather Service installed, but learned we are more interested in MDS

Some Lessons Learned • NMI Testbed Process working well

• •



 Answers found through NMI discussion lists from developers and other users Have to “sell” the grid concept to developers, administrators, users NMI Work proven helpful in other grid work  TeraGrid  ISS Space-based Science Operations Grid  CEOS Grid Need more components!

View more...

Comments

Copyright © 2017 DOCUMEN Inc.