Data mining/warehousing

  Please follow instructions in milestone 2 word file for directions. Please utilize milestone one for the topic (included) and complete milestone 2 based off that concept. Milestone one is not to be reworded and reused. New material that adds to milestone must be created based off directions provided. Also a model must be created as stated in the instructions. Please use the following book as citation material Data Science for Business What You Need to Know about Data Mining and Data-Analytic Thinking By Foster Provost, Tom Fawcett Publisher: O'Reilly Media. You are free to use any other sources for citation. Assignment: Milestone Two Instructions Class, Since Chapter 6 focuses a great deal on classification tasks in the book, on this assignment you are tasked with seeing how gathering data from neighbors can be used to classify a new instance in a super-simple setting. Once an object can be represented as data, we can begin to talk more precisely about the similarity between objects, or alternatively the distance between objects. For example, let’s consider the data representation we have used throughout the book so far: represent each object as a feature vector. Then, the closer two objects are in the space defined by the features, the more similar they are. To use similarity for predictive modeling, the basic procedure is beautifully simple: given a new example whose target variable we want to predict, we scan through all the training examples and choose several that are the most similar to the new example. Then we predict the new example’s target value, based on the nearest neighbors’ (known) target values. How to do that last step needs to be defined; for now, let’s just say that we have some combining function (like voting or averaging) operating on the neighbors’ known target values. The combining function will give us a prediction, plus semi-supervised classification, active learning, and transfer learning are all useful for situations in which unlabeled data are abundant. Continuing on where you left off on Milestone One... III. Data Preparation • Explain how this data are integrated to produce the format required for data mining. IV. Modeling • Specify the type of model(s) built and/or patterns mined. o Use a figure to represent the data model, graphic or table. • Discuss choices for data mining algorithm: what are alternatives, and what are the pros and cons? • Discuss why and how this model should “solve” the business problem (i.e., improve along some dimension of interest to the firm). Most of us are familiar with Amazon’s common "Customers who purchased X also purchased Y" feature. Every time we buy something, or even just every time we click a link on their site, that action is recorded, tracked, collated, and analyzed before being put to work on your next visit. If you were to for Amazon create a classification tree, how many branches and nodes would it have? With this assignment you will be asked to design a tree for your organization, and explain how data classification supports the business model. How would for instance you include or use "nearest neighbor", if at all? • How big would this decision tree need to be? • How accurate do you think a tree that big would be? • How many products or points of relational data are too many to recommend? Using Amazon's approach in mind as a business example, let's develop an analysis that answers the above questions and responds to the following: Briefly explain the concepts of semi-supervised classification, active learning, and transfer learning, paying attention to how one or all are useful, as well as their potential challenges from these approaches to classification. What can the organization learn about their constituents from monitoring the "Customers who purchased X also purchased Y" approach? ________________________________________ Articulation of Response • Always include a title page with your name, the date, the course name/number, the title of the assignment or paper, and the revision (if applicable). • In the body of the paper, use headings and sub-headings. Do not jump from subject to subject without providing some type of heading beforehand. • Use correct grammar and punctuation. Capitalize the first word of a sentence. • Make the presentation as professional as possible. Think, “If someone were to look at this paper, what would they think?” Sloppy papers may have correct answers, but they still leave an overall “messy” feeling when read. • Make sure you cite reference material in APA style within the text of your submission (e.g., according to John, “citing in text is a key concept in this course” [Doe, 2013]).