Title

Modular Abstraction of Complex Real Time Analysis

Presentation Type

Presentation

Abstract

The highly technical barrier to entry for Big Data and real time analytics has proven to be a hindrance to groups and projects that seek to find insight in large quantities of data or data with a high rate of change. A modular framework that would allow stakeholders to rapidly write and deploy filters to the IBM Infosphere Streams product would provide this level of abstraction and make Big Data technology much easier to implement and use.

A common pipeline for data will be developed in tandem to this project in which a simple schema is created. This schema provides a common mechanism to rapidly identify information from a variety of sources of large and changing sets of data. In addition to this common schema, a modular interface implemented using primitive operators will provide a simple API in a common programming language for domain experts to develop filters without the need for a comprehensive understanding of the underlying IBM Infosphere Stream technical details.

While a large amount of effort has been applied to make large scale data analysis more expressive and powerful, this has been at the expense of agile development and ease of implementation for non-technical users. The novel nature of this project is to bring a subset of this expressive power with the benefit of ease of use to facilitate rapid and iterative development. This allows domain experts to effectively explore data more quickly.

This research project provides value as it abstracts much of the complexity of applying filtering to a rapidly growing and changing set of data. Stakeholders can then implement Big Data and real time analysis rapidly and with technical agility. This notion, whether applied to business intelligence, academia, or government, has tremendous value as it provides a comfortable mechanism to understand data with more clarity.

Category

Physical Sciences

This document is currently not available here.

Share

COinS
 
Apr 11th, 10:20 AM Apr 11th, 10:40 AM

Modular Abstraction of Complex Real Time Analysis

The highly technical barrier to entry for Big Data and real time analytics has proven to be a hindrance to groups and projects that seek to find insight in large quantities of data or data with a high rate of change. A modular framework that would allow stakeholders to rapidly write and deploy filters to the IBM Infosphere Streams product would provide this level of abstraction and make Big Data technology much easier to implement and use.

A common pipeline for data will be developed in tandem to this project in which a simple schema is created. This schema provides a common mechanism to rapidly identify information from a variety of sources of large and changing sets of data. In addition to this common schema, a modular interface implemented using primitive operators will provide a simple API in a common programming language for domain experts to develop filters without the need for a comprehensive understanding of the underlying IBM Infosphere Stream technical details.

While a large amount of effort has been applied to make large scale data analysis more expressive and powerful, this has been at the expense of agile development and ease of implementation for non-technical users. The novel nature of this project is to bring a subset of this expressive power with the benefit of ease of use to facilitate rapid and iterative development. This allows domain experts to effectively explore data more quickly.

This research project provides value as it abstracts much of the complexity of applying filtering to a rapidly growing and changing set of data. Stakeholders can then implement Big Data and real time analysis rapidly and with technical agility. This notion, whether applied to business intelligence, academia, or government, has tremendous value as it provides a comfortable mechanism to understand data with more clarity.