blogger counters

Sunday, June 23, 2013

Data Mining: What You Might Not Know

The ultimate goal of data mining is not the acquisition of data, but the exploration and analysis  of massive amounts of data resulting in patterns, rules, and relationships. One of the key outcomes is the identification of reliable and meaningful patterns.

Meaningful patterns can do the following:

* Model typical behaviors
* Identify atypical behaviors
* Express possible cause and effect relationships
* Explain past behaviors
* Describe current conditions
* Develop a predictive model for the future

While developing patterns in data drawn from different types of data can show meaningful  relationships, time-series data mining can be used to postulate

* causality
* life-cycle behaviors
* impacts of proximal or distal relationships
* cluster formation and disaggregation

Characteristics of Data Mining Data

The data may represent changes of behavior and activities over time, or, alternatively, it could  represent the relationship of different types of data which have been collected at one point in  time.

* Same type of data, collected at different points in time
* Different types of data, collected at the same point in time
* Data streams, which involve ordered sequences of items that arrive over time

* offline: regular chunked arrivals
* online: continuous flow

Planning the Data Mining Process

The development of data mining can follow a fairly clear process:

1.  Determine the problem and definition
2.  Determine the characteristics of the data
3.  Develop a plan for data mining
4.  Review of similar data mining projects and algorithms
5.  Become familiar with data, data issues, potentially meaningful data subsets
6.  Data preparation and conditioning
7.  Model and algorithm development
8.  Evaluation of model / comparison with other models
9.  Implementation, which involves generating reports, or continuing to develop ongoing activities

Data Mining: Key Tasks

Key tasks in data mining include the following:

* Identification
* Eliminate unnecessary or distracting frequent item sets
* Definition and differentiation
* Optimize storage and recovery of streams
* Classification
* Cluster recognition
* Segmentation
* Discovery of motifs (sub-sequences)
* Detection of similar clusters or sets
* Detection of outliers and anomalies
* Create predictive models

Data mining that involves continuous streams of data presents unique challenges because of the  nature of the data and types of patterns that are meaningful, given the array of patterns that are  possible to develop. It is also challenging to integrate incoming data with existing databases in  order to qualitatively evaluate patterns in a timely way. It is also challenging to avoid the  "concept drifting" problem, which means that the usefulness and validity of the results will  degrade over time.

In general,

* Sensors and surveillance: networks, physical locations, manufacturing, transportation
* Performance monitoring: manufacturing, networks, controls
* Transaction / activity monitoring: retail, web performance, manufacturing


A literature review of algorithms suggests that data mining for data streams is generally  performed using three different major classifications of algorithms, and that they do not yield  the same results, which could be quite significant, depending on the application.

Landmark Window Based Data Mining

What is measured is the difference between a specific time-stamp (the landmark) and the present.

Pros: Complete comparision with an a priori property
Cons: The order in which information is considered and placed into sets can lead to errors

Damped Window

This approach privileges new data over old or historical data, which means that the older data drops out of  consideration for developing sets

Pros: Efficient use of resources, eliminates old and obsolete information
Cons: Large errors may be made because the information being eliminated may be important for the  rule to be effective

Sliding Window

Sliding favors new data (as in the Damped Window approaches), but does not completely eliminate  old data. Instead, it incorporates summarized versions of old data and data relations.

Pros:  Can incorporate past data and do so relatively quickly
Cons:  The assumptions made to create summaries of old data sets can be flawed

General Observations and Conclusions

At this point in time, the ability to collect data continues to expand and sometimes dramatically,  thanks to technological advances in both hardware and software. However, a review of the processes  and the literature make it clear that the algorithms use to process and make meaning of the data  batchs and streams differ widely. Consequently, the results and conclusions that are created using  data mining techniques (both collecting and in analyzing), can be highly variable. Thus, decisions  made through data mining need to be made carefully, and more than one analytical technique and set  of algorithms should be used.


Esling, P., & Agon, C. (2012). Time-Series Data Mining. ACM Computing Surveys, 45(1), 12:1-12:34.

Mala, A. A., & Dhanaseelan, F. (2011). Data Stream Mining Algorithms: A Review of Issues and  Existing Approaches. International Journal On Computer Science & Engineering, 3(7), 2726-2732.

Ramageri, B. M., & Desai, B. L. (2013). Role of data mining in retail sector. International  Journal On Computer Science & Engineering, 5(1), 47-50.

Wednesday, June 19, 2013

Interview with Matt Loper, Innovators in E-Learning Series

One might argue that it's not necessary to have a special platform for online tutoring -- after all, free platforms such as Skype and Facetime are very effective. Further, opensource web conferencing such as Big Blue Button have many of the features (whiteboard, tools, etc.), of an Adobe Connect or Omnovia. However, when it comes to locating vetted tutors and subject matter experts, it's another matter. has launched a new platform to integrate the different elements: video chat, scheduling, tutor profile, etc.

Welcome to an interview with Matt Loper, co-founder and CEO of

1.  What is your name and your relation to elearning?
My name is Matt Loper and I am the Co-Founder and CEO of

2.  What do you think are some of the issues that currently concern elearning program developers?
There are a lot of great companies and nonprofit organizations out there trying to completely change the paradigm of traditional education.  Change is definitely necessary as the American educational system is extremely inefficient and expensive.  However, with so much established behavior and infrastructure, there is quite a bit of inertia to completely overhauling the entire system.  Moonlyt seeks to supplement the established system so that students can get the help they need at affordable prices.  Private tutoring is a huge market (some studies show 1 in 4 American high school students have a private tutor) but it is also completely fragmented to local markets.  Moonlyt is a global platform that allows tutors to eliminate geographical constraints by teaching over video chat.  We strive to add accountability to tutoring by allowing students to comparison shop tutors based on user reviews and prices. Home Page

3.  What is Moonlyt and how does it work?
Moonlyt is a social learning and online tutoring platform that allows highly qualified individuals to make money teaching online.  We also allow students and parents to find the very best tutor at the very best prices based on past user reviews.  On Moonlyt, tutors set their own prices and schedules and perform all lessons on our video chat platform.

Teachers / Tutors

4. Please provide two or three examples of how Moonlyt has been used (please share jpeg screen shots as well).
Moonlyt just launched a few months ago but already features a diverse set of tutors including guitar, SAT preparation, mathematics, sciences, foreign languages, cooking lessons, and career advice.  I have personally done several sessions giving career advice to college students looking to get into the financial field or entrepreneurship. session

5. Describe plans for future developments in Moonlyt .
We are working hard to further enhance the interaction between tutors and their students.  We think that the Moonlyt platform allows for online tutoring sessions to be as effective as in-person tutoring.  Our goal is to create tools and analytics that allow Moonlyt tutors to be far more effective than in-person tutors.

6.  What is your philosophy of education in a multi-platform BYOD world?
Technology in the classroom has the ability to greatly enhance students' experience and access to information while allowing teachers to leverage their time and scale their audience.  However, we must remember that technology is a supplement to great teachers, not a substitute.  We will always need great teachers to inspire and motivate students.

Creating a Posting

7.  What are two or three books that you'd recommend (do not have to be on elearning ;) )
Thinking Fast, Thinking Slow by Daniel Kahneman - This book gave me a deeper understanding of how we think and interact with each other
The Four Hour Chef by Tim Ferris - A book about the most efficient way to learn any new skill disguised as a cooking book.

Texture Press: Free E-Books
(full text available free online)

Writing for Human Relations
E-Learner Survival Guide
E-Learning Success: From Courses to Careers
The Glass Book 

Blog Archive