Tuesday, June 09, 2015

PAMA–Progressive Analysis Methodology & Algorithm

The following article is an idea for my research.

Abstract

Pattern recognition is a branch of Machine learning that focuses on recognition of patterns and regularities in data, although in some cases, it is considered to be nearly synonymous with machine learning.

Source: From WikiPedia, http://j.mp/PatRecog

This article is an early vision for understanding and recognizing the right algorithm towards identifying the available patterns with the help of parameters that are exposed by the data points of any given situation.

1. Introduction

The challenge in this theory is identifying the patterns that are evolving form the enormous amount of data for any given time. Not just that, identifying the right pattern from these possible combinations and executing the analysis with a common methodology.

The success of this methodology depends on the algorithm that is used to analyze the available pattern. The crucial situation is to match with the previous data points as well as the current data points.

To perform analysis on any given number of data points, there has to be a baseline point for the given data. To analyze and recognize the pattern, the baseline plays a key role in this methodology.

Very often, these data points will express some similarity within them. Understanding these similarities and preparing a pattern is the exact definition of the behavior of the data points.

Huge amounts of documents and research papers available on Pattern recognition, not on Pattern Analysis and the need for an algorithm.

2. Problem Statement

Humans has special capability and can quickly identify the patterns around their surroundings with the intellect that is possessed within the neuron system. Whereas machines are purely algorithm based and function with the logic that is fed at some point of time. Machine learning happens only when there is enough amount of data and the right baseline for this data.

1.

2.

2.1. Dimensional Model

At times, 3 data points is sufficient for defining the baseline, in plotting a linear curve.

clip_image001

In this example, these three points give us a pattern when they are analysed with X-Y axis graph.

clip_image002

This graph will give us lot of data. But, if we have to understand the pattern, the angle of slope between these data points is enough to identify the pattern in these data points. Once the pattern is identified, the distance between any two data points will help in predicting the next data point as well as a straight line.

Thus, it is very clear, a common methodology like this and an algorithm to predict the pattern will help the machine to learn as the data accumulates over the period of time.

In the current digital world, we have enormous amount of data that is available for all kinds of analysis. The missing point is the right algorithm to analyze this data as well as a proven methodology to make the machines to learn from the past data and continue to learn with the on stream flow of the present data.

2.2. Machine learning thru Pattern analysis

Any machine learning is materialized through several theories and research on the available data. These theories are put into reality with the past available data as well as currently streaming of data over all means of available sources. These sources can be using the medical equipment’s, personal gadgets, webservers, social networks, network of connected systems.

Hence, any machine learning is classified as rule-based algorithm. These rules are the identified methodology for these data points and the base line is shifted with the variation of these data points over the period of time region.

If we divide a region of a space into regular cells, then the number of such cells grows exponentially with the dimensionality of space. The problem with an exponentially large number of cells is that we need an exponentially large quantity of training data in order to ensure that the cells are not empty [[1]]. The key idea over the large quantities of data is the training that is embedded onto the machines that process these data. The success of this analysis is in identifying the description and deeds expressed of these large quantities of data.

These large quantities of data will not yield any benefit if they are not processed with right algorithm and they are defined with reference to their boundaries as well as the measurement units. The level or scale of measurement depends on the properties of the data, it’s very important to establish the scale of measurement of the data to determine the appropriate statistical test to use when analyzing the data [[2]].

3. Problem interpretation

Any data that is analyzed has to be based on the baseline of its current state. In most cases, the baseline can be between the central tendency and the sum of central tendency & the quarter length of the current series. The relation among these data points can also be analyzed by the regression analysis.

clip_image004

Where

bl is baseline pointer

n is natural set of the data

l is the length of the data

This definition of this baseline pointer is the initial proposal and interpretation before the research on this concept.

This base line is identified with the repetition of values over the same intervals. Once the first repetition of these values are found, then the base line can be prepared from that first occurrence of the repeated data points.

3.

3.1. Baseline Justification

When Sine curve is plotted with negative values as well as positive values, it gives a clear reflection of mirror image for all values.

clip_image006

The same when plotted with the repeated data points, a graph can be plotted as below.

clip_image008

Another example in understand the common behavior of the data points and identifying the baseline point is the Electrocardiography of human.

clip_image010

Source: ECGPedia.org

All such data points produce data that is meaningful and help to understand the behavior. These small amounts of data that can be calculated over a short span of measure. Whereas the challenge is not with small set of data and with short span of measure.To understand or to analyze the data points, it is very much important to start from begin and do the understanding mechanism regressively. Thus, as the data progresses the learning becomes an asset for the algorithm and the analysis becomes easy. Hence, the progressive approach methodology and an algorithm.

3.2. Regressive Study

While understanding data points, it is much necessary to understand the properties of the data points. Each data property should be identified with a finite set of variables that are very common in all these data points. Understanding these variables and selecting them is nothing but a particular form of model selection [[3]]. Once the data points are identified with the common data properties, the processing methodology will evolve. A single data property and related data points makes clear the usage of Linear Regression. When nonlinear combination of these parameters depend on one or more independent data points, a nonlinear regression model can be applied.

But, the choice of these models can be applied by the common algorithm that is learning the data points progressively. The input for this algorithm is predictive parameter identification along with the data behavior. Hence, a step wise regression has to be performed on all the data points.

4. Conclusion

While working with machine learning, it is very important to have a methodology and a set of algorithms for analyzing the pattern and form a basis for the pattern recognition.

There were lot of known pattern recognition solutions were available, they are all prepared on a paired set of variables and with a prescribed data rules. The need is with common guidelines as well as reusable set of variables is very much a necessity for machine learning and predictive analysis.


[1] Chapter 1.4. The Curse of Dimensionality, “Pattern Recognition and Machine Learning” by Christopher M. Bishop

[2] Chapter 3. Defining, Measuring and Manipulating Variables, Scales of Measurement, “Research Methods and Statistics, A critical thinking approach” by Sherri L. Jackson

[3] Overfitting in Making Comparisons between Variable Selection Methods, by Juha Reunanen for the Journal of Machine Learning Research 3 (2003) 1371-1382

Tuesday, October 15, 2013

How to accelerate any (Soft Ware) project

Now-a-days, I’ve been helping the pre-sales folks towards finalizing the prospective projects for our employer. In the recent time, one of our prospective client asked about the tips to accelerate any project from our experience. I’ve done some investigation and found the following points. I’m documenting here, such that, before I forget them and the effort in identifying these points should not end in vain.

Process

1) Expand the team with dedicated developers towards each module – UnitDrivenDevelopment
2) Plan the build delivery
3) DB Management should be ongoing activity
4) Integrating testing as often as possible
5) Prioritize  the Interface Requirement Specifications (IRSs) as frequently as possible

Approach

1) Escape the probability approach
2) Avoid problematic functionality
3) Document the Traceability Index
4) Have the reality (nearly real) data with the developers
5) Plan for the asynchronous processing

Technology

1) Have the LOGs in place within Code
2) Plan the audit functionality for DB
3) Automate & escalate all Event / exception logs to the respective folks
4) Have the loosely coupled objects interact with a mutual contract pattern
5) Task driven user interface should be the theme of workforce
6) Think of XML/REST/JSON over ASMX/WCF/etc
7) REST over HTTP would be a great idea

Thursday, December 27, 2012

Self Managed Teams

This post is not about any technical concept, but about the concepts of a given team (or) the comparison of traditional and agile teams

Traditional Teams

Parameter

SelfManaged Teams

Direction and Control

Manager's Role

Coaching, facilitation, and support

Manager

Responsibility for Performance

Team

Top Down

Information Flow

Downwards, upwards and cross

Narrow, single-task roles

Job Role/Function

Whole process and multiple task

Top Down, imposed

Decision Making

By Team, within agreed boundaries

Manager

Authority Team

 

Static

Development of team

Evolutionary

 

This above table is taken from a magazine that is currently not in print. Executive Capsule is the name of the magazine.

Wednesday, December 26, 2012

Requirements of COM Architecture

This post might seems funny, writing about COM in the time of Cloud. Yet, this is a long time pending notes that I’ve prepared and are not documented. This is the time that I felt to document, of course, there are much information that is available on web. This would be an extra page that define the COM Architecture.

I see the need of this architecture in 4 directions.

# INTEROPERABILITY

Components must be compatible with the

1) application’s environment where they are used

2) and also with the components developed by the 3rd party vendors

# VERSIONING

Components must NOT require upgrades, if applications containing them are upgraded

# TOOL INDEPENDENCE

Components must PERMIT development tool interoperability, while create any development tool (or) language during design time

# DISTRIBUTED FUNCTIONALITY

Components must have an ABILITY to generate components that support not only function in process but also across process and networks ie., DOM

 

Final NOTE: COM is not a software, but a framework

Sunday, November 18, 2012

What type of Project is needed by your client?

Most of these days, everyone are moving to cloud apps. More or less, they are deciding to have their applications be available on web. Is it a trend or all requirements are demanding to become web apps?

Before we proceed further, from my understanding is that the need of the application decides the nature of the application, neither the investor nor any other party that is involved in the application life cycle. NOT EVEN THE END USER.

But how do we understand the nature of the application?

The below questionnaire helps in understanding the current requirement.

Sl Question Yes/NO/NA
1 Are your users comfortable using a Web browser?  
2 Are your users located in remote sites?  
3 Do your users in remote sites have access to the Internet?  
4 Are you creating a Business to Business (B2B) application?  
5 Are you creating a Business to Consumer (B2C) application?  
6 Is the amount of data entered minimal?  
7 Is the amount of data to display on the screen minimal?  
8 Is the number of fields on the screen fairly small?  
9 Does each user "own" their data?  
10 Are the same rows of data rarely updated by multiple users at the same time?  
11 Is this application mainly for light data entry, where speed of data entry isn't critical?  
12 Is there a lot of data review that requires "tall" pages?  
13 Do your users like to scroll through the data, as opposed to tabbing through data?  
14 Are there minimal data items on a screen that cause other data to change on that same screen?  
15 Can your users minimize the need to exchange data dynamically with other products running on the same desktop?  
16 Is performance a secondary consideration?  
17 Do your developers (or you) want to develop for the Web?  
18 Do your developers (or you) have the skills to develop for the Web (or can they quickly learn how)?  
19 Do you want a very graphically appealing look and feel?  
20 Do you have a lot of large screens that would warrant scrolling windows?  
21 Is it important to keep deployment costs to a minimum?  
22 Is it important to keep upgrade costs to a minimum?  
23 Will there be frequent updates to software?  
24 Can you hire/train Web programmers more cheaply than desktop programmers?  
25 Do investors and/or shareholders want a Web application?  
26 Would your users prefer a browser interface to a desktop application?  
27 Do users in remote sites have a high-speed connection to your internal network?  
28 Is it fairly easy to install Internet access in remote sites?  
29 When your users travel, do they usually have access to the Internet?  
30 The application interacts with other providers over internet (such as weather notifications, GPS, GIS, etc) ?  
31 Is this application only for one department?  
32 Is there a need to connect to special hardware?  
33 Do you need Drag-and-Drop support in this application?  
34 Are you designing a game, CAD, or CAM application?  
35 Do you need a lot of special controls for limiting data input (such as input masks)?  
36 Can deployment of this system be done through a network, by distributing CDs, or using push servers?  
37 The major portion of the user make use of this application from work desk?  
38 Is there a scope of gadgets usage in the near future?  
39 The application should function when the data is modified by other users? (And yet, warn the user before committing?)  
40 The application should have a facility for auditing the data manipulation?  
41 Can the application be hosted on intranet of the deployment location?  
42 The application doesn't demands the data sharing from online portals (such as XE for all the latest currency details) ?  
43 The count of the multiple users at any given point is greater than 10?  
44 The count of the concurrent users at any given point is greater than 1000?  
45 The users systems are built with Intel 'x'Core / Pentium technology?  
46 Are the user systems within the secured environment?  
47 There will be multiple versions of this application and buyers are licensed WRT these versions  
48 Can upgrades of this system be done through a network, by distributing CDs, or using push servers?  
49 The application does not demands extensive processing power?  
50 Can the application be split into different hosts, depending on the roles of the different users?  

 

Now that we have some understanding about the requirement, then how do we compile this information.

Step 1) Please replace all the Yes / No / NA with 1/0/-1 respectively from the below table

Step 2) Add the responses from Q1 to Q31 and store it separately

Step 3) Add the responses from Q32 to Q50 and store it separately

Step 4) Subtract  Step 3 result from Step 2 result

You would get a final number either +ve or –ve.

Response 1) Any number that is greater than 10, that is a need of WebApplication

Response 2) Any number that is less than 10 and a non-negative number indicates the need of a WinApplication

Response 3) Any negative number reflects as a smart client (or) split application architecture

Wednesday, May 30, 2012

Best practices for exception handling within .NET

While understanding the best practices for exception handling / tracking / etc., in the .NET, I’ve ended up with the below information

Best practices in Exception handling within .NET

Before identifying the best practices, it is mandatory to identify the exception generation situations and handle them to the best need of the situation. The below is the best scenario for the exception handling.

clip_image001

 Fig 1: Exception Workflow

[Source : Internet, I forgot the source link, but I’ve copied this above diagram. As soon I find the source, would mention here. In case if the reader came across of the previous article, please let me know

The above diagram is copied from Christian Thilmany’s blog post from this link.]

There are two ways of handling these exceptions

·         Catch them & Act Immediately

o   Early

o   Late

·         Catch them to Record (or) log & Act afterwards

Catch & Act

This mechanism is generally implemented in all the coding practices. This approach is implemented by the “try .. catch .. finally” key block. This is the finest mechanism provided by the .NET framework. Yet, this is very expensive to use and implement. The “try .. catch .. finally ..” causes an extra amount of memory management code. Thus, .NET framework provides a flexibility of custom exception classes implementation to handle them within the application. Hence we have the below 2 options

·         Usage of inbuilt TRY CATCH FINALLY is a SEH, ie., Structured Exception Handling mechanism

·         Implement custom EXCEPTION classes

Conclusion & Recommendation

Use the “TRY..CATCH..FINALLY” to catch the exception early and customize the exceptions into generic classes. Implement the needed functionality on the custom classes. Thus, I recommend to implement

1.       the “TRY..CATCH..FINALLY” block in all the unknown scenarios

2.       Implement a custom exception class for all the known exceptions

3.       Implement “TRY..CATCH..FINALLY” block only at the data layer

Record & Act

This mechanism is generally implemented in all the portals where the exceptions are not handled immediately thru coding, but for recording and the purpose of investigation. In this mechanism, all the exceptions are consumed by the code and recorded. These exceptions are not exposed to the end users. A generic message (or) a generic page is navigated when any source throws exception. Hence, the end user is not aware of the original exception message, which is recorded for future purpose.

Over periodic intervals these recorded exception messages are investigated and traced the root cause. There are various implementation practices for this kind of coding practice. One of the best mechanism is to have these exceptions caught and recorded at

·         Application level

·         Session level

·         Request level

All these exceptions which are caught at various levels are recorded at various means. The commonly used practice is to write them to event log under a specific EVENT log. It is not a WISE practice to write to generic event log.

There is various reusable exception handling frameworks that are available in the current world. The below are the few names that are popularly used.

·         EHAB : Enterprise Library Exceptional Handling Application Block

·         Log4net : Widely used by industry  in the previous years

·         ELMAH : Acquiring the industry adaption as the new versions are finding this as friendly

The best practice of these recordings done by these tools is to a variety of locations like the below mentioned

·         The event log

·         An e-Mail message

·         To Database

·         A Message Queue

·         A Text file

·         WMI Event

·         Custom format and locations. XML formats, encrypted formats, etc .,are stored at network drives

 

Conclusion & Recommendation

Use the “TRY..CATCH..FINALLY” to catch the exception early and customize the exceptions into generic classes. Implement the needed functionality on the custom classes. Thus, I recommend to implement

1.       The “TRY..CATCH..FINALLY” block ONLY in all the unknown scenarios

2.       Implement a custom exception class for all the known exceptions

3.       Implement “TRY..CATCH..FINALLY” block ONLY at the data layer

NOTE: I highly recommend to the implementation of ELMAH. But we need to have a process in place to investigate and deal the exception.

On top of all, in the bottom line of my heart, I’m an ardent believer of “not using the TRY CATCH block”. Thus, most of my code is written without TRYCATCHes. I’ve been successful all these days. Need to learn where would I fail so that I start use the TRYCATCHes