Nelish on Tue, 23 Aug 2016 12:33:37
n the classification technique I read that The accuracy of the model is evaluated. Only if the accuracy is over a certain threshold, the model will be used in the following step.which isTest data analysis: Using the model previously obtained, a classification of new test data is executed. Moreover, the model can be used to improve an already existing data
desc ption. My question is how to evaluate the accuracy over a certain threshold in the training set is ther any techinque to get the certain threshold cold u plz give me a reference for that and how to evaluate the training set before we test the data
second question What we mean by evaluating explorative heuristics of new test data
Jaya Mathew on Tue, 23 Aug 2016 13:49:16
Yes, the accuracy of the model is determined based on your 'test' dataset which you did not use while building your model. When building your model you use the 'train' dataset from the 'split module'. By default the threshold for binary classification for the 'Scored Labels' is 0.5.
Here are some articles which might be helpful:
I am not sure what you mean by 'My question is how to evaluate the accuracy over a certain threshold in the training set is ther any techinque to get the certain threshold cold u plz give me a reference for that and how to evaluate the training set before we test the data'. Which article are you referring to?
Nelish on Tue, 23 Aug 2016 17:05:10
thanx for your reply I have read this thing in the book"Successes and New Directions in Data Mining"
Hai Ning on Tue, 23 Aug 2016 18:21:15
can you give source and context of this phrase "evaluating explorative heuristics of new test data"?
Jaya Mathew on Tue, 23 Aug 2016 18:22:02
I am not sure if your second question refers to tuning hyper parameters in the Azure ML context: https://msdn.microsoft.com/en-us/library/azure/dn905810.aspx?f=255&MSPPError=-2147217396
David E. Coleman on Tue, 23 Aug 2016 18:59:12
I think i can clarify - because i believe i am dealing with the same general issue. Depending on context, it is not always best to optimize the fit for a binary classification algorithm according to a *standard metric* (such as accuracy, precision, or recall) at an *arbitrary threshold* of 0.5 In actual use, the threshold may be deliberately set to a value far different than 0.5. Q1: Can a user optimize the fit for a different specified threshold? One could also ask - Q2: Can the fit be optimized for a different metric (such as maximum lift), or more generally, for a different loss function?
Example: Fraud detection may use a large data set with a proportionally very small minority of records that reflect the fraudulent condition. We might want to use binary classification to estimate the likelihood of fraud on a per-transaction basis, do a declining sort on those estimated probabilities, and start at the top to rank-order priority for investigation... knowing full well that we will only get through a small fraction of the list. An initially high lift level will ensure we are using our time most efficiently.
Jaya Mathew on Wed, 24 Aug 2016 13:59:31
I do not think what you describe is currently supported when tuning the default hyper-parameters in the in-built Azure ML modules where standard metrics (accuracy, precision, recall) is being tuned. You might need to write your own R/Python code to do such custom tuning.
You can add in your request for additional functionality via the feedback forum: