Removing variable with big p-value?
up vote
0
down vote
favorite
I have made a regression with 2 explanatory variables. The summary of that regression shows that one of my variable has a big p-value (0.705). Should I include that variable when writing the the y hat equation?
statistics linear-regression p-value
add a comment |
up vote
0
down vote
favorite
I have made a regression with 2 explanatory variables. The summary of that regression shows that one of my variable has a big p-value (0.705). Should I include that variable when writing the the y hat equation?
statistics linear-regression p-value
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have made a regression with 2 explanatory variables. The summary of that regression shows that one of my variable has a big p-value (0.705). Should I include that variable when writing the the y hat equation?
statistics linear-regression p-value
I have made a regression with 2 explanatory variables. The summary of that regression shows that one of my variable has a big p-value (0.705). Should I include that variable when writing the the y hat equation?
statistics linear-regression p-value
statistics linear-regression p-value
asked Nov 14 at 23:21
Camue
31
31
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
This depends on your expected results. In your cases, you have only 2 features and, if you remove one of them, the percentage that you lose important data will really high.
Instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze boxplot and adjust the threshold to gain the more reliable data.
If you have enough data, you can split data into training, validation and test set. Then, you can improve your model coefficient by using some voting methods in the validation set.
Finally, you can implement the result coefficient R-square, p-value... and do some test ANOVA testing, AIC score... to compare two cases.
Thanks. Very helpful. Could you list some of the voting methods?
– Camue
Nov 17 at 10:24
add a comment |
up vote
0
down vote
This depends on the goal of your analysis. Have you made a hypothesis that both your explanatory variables affect the dependent variable? In this case you shouldn't remove the variable since you'd be modifying your regression a posteriori (that is after you've collected your data.)
Are you trying to make a descriptive statement about what you're analyzing? For example, are you trying to understand whether education and sex predict income? Similarly, you shouldn't drop a variable since you'll no longer be able to conclude that one of the two variables has no effect.
Finally, are you trying to make a prediction? In this case, it's appropriate to try both models and compare their performance. You can do this using an F-test/ANOVA.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
This depends on your expected results. In your cases, you have only 2 features and, if you remove one of them, the percentage that you lose important data will really high.
Instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze boxplot and adjust the threshold to gain the more reliable data.
If you have enough data, you can split data into training, validation and test set. Then, you can improve your model coefficient by using some voting methods in the validation set.
Finally, you can implement the result coefficient R-square, p-value... and do some test ANOVA testing, AIC score... to compare two cases.
Thanks. Very helpful. Could you list some of the voting methods?
– Camue
Nov 17 at 10:24
add a comment |
up vote
0
down vote
accepted
This depends on your expected results. In your cases, you have only 2 features and, if you remove one of them, the percentage that you lose important data will really high.
Instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze boxplot and adjust the threshold to gain the more reliable data.
If you have enough data, you can split data into training, validation and test set. Then, you can improve your model coefficient by using some voting methods in the validation set.
Finally, you can implement the result coefficient R-square, p-value... and do some test ANOVA testing, AIC score... to compare two cases.
Thanks. Very helpful. Could you list some of the voting methods?
– Camue
Nov 17 at 10:24
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
This depends on your expected results. In your cases, you have only 2 features and, if you remove one of them, the percentage that you lose important data will really high.
Instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze boxplot and adjust the threshold to gain the more reliable data.
If you have enough data, you can split data into training, validation and test set. Then, you can improve your model coefficient by using some voting methods in the validation set.
Finally, you can implement the result coefficient R-square, p-value... and do some test ANOVA testing, AIC score... to compare two cases.
This depends on your expected results. In your cases, you have only 2 features and, if you remove one of them, the percentage that you lose important data will really high.
Instead of removing the insignificant feature, you should try to make it better by detecting an anomaly or dropping the outlier. In a common way, plotting covariance matrix to see how relevant btw the features, you can analyze boxplot and adjust the threshold to gain the more reliable data.
If you have enough data, you can split data into training, validation and test set. Then, you can improve your model coefficient by using some voting methods in the validation set.
Finally, you can implement the result coefficient R-square, p-value... and do some test ANOVA testing, AIC score... to compare two cases.
answered Nov 15 at 6:53
AnNg
375
375
Thanks. Very helpful. Could you list some of the voting methods?
– Camue
Nov 17 at 10:24
add a comment |
Thanks. Very helpful. Could you list some of the voting methods?
– Camue
Nov 17 at 10:24
Thanks. Very helpful. Could you list some of the voting methods?
– Camue
Nov 17 at 10:24
Thanks. Very helpful. Could you list some of the voting methods?
– Camue
Nov 17 at 10:24
add a comment |
up vote
0
down vote
This depends on the goal of your analysis. Have you made a hypothesis that both your explanatory variables affect the dependent variable? In this case you shouldn't remove the variable since you'd be modifying your regression a posteriori (that is after you've collected your data.)
Are you trying to make a descriptive statement about what you're analyzing? For example, are you trying to understand whether education and sex predict income? Similarly, you shouldn't drop a variable since you'll no longer be able to conclude that one of the two variables has no effect.
Finally, are you trying to make a prediction? In this case, it's appropriate to try both models and compare their performance. You can do this using an F-test/ANOVA.
add a comment |
up vote
0
down vote
This depends on the goal of your analysis. Have you made a hypothesis that both your explanatory variables affect the dependent variable? In this case you shouldn't remove the variable since you'd be modifying your regression a posteriori (that is after you've collected your data.)
Are you trying to make a descriptive statement about what you're analyzing? For example, are you trying to understand whether education and sex predict income? Similarly, you shouldn't drop a variable since you'll no longer be able to conclude that one of the two variables has no effect.
Finally, are you trying to make a prediction? In this case, it's appropriate to try both models and compare their performance. You can do this using an F-test/ANOVA.
add a comment |
up vote
0
down vote
up vote
0
down vote
This depends on the goal of your analysis. Have you made a hypothesis that both your explanatory variables affect the dependent variable? In this case you shouldn't remove the variable since you'd be modifying your regression a posteriori (that is after you've collected your data.)
Are you trying to make a descriptive statement about what you're analyzing? For example, are you trying to understand whether education and sex predict income? Similarly, you shouldn't drop a variable since you'll no longer be able to conclude that one of the two variables has no effect.
Finally, are you trying to make a prediction? In this case, it's appropriate to try both models and compare their performance. You can do this using an F-test/ANOVA.
This depends on the goal of your analysis. Have you made a hypothesis that both your explanatory variables affect the dependent variable? In this case you shouldn't remove the variable since you'd be modifying your regression a posteriori (that is after you've collected your data.)
Are you trying to make a descriptive statement about what you're analyzing? For example, are you trying to understand whether education and sex predict income? Similarly, you shouldn't drop a variable since you'll no longer be able to conclude that one of the two variables has no effect.
Finally, are you trying to make a prediction? In this case, it's appropriate to try both models and compare their performance. You can do this using an F-test/ANOVA.
answered Nov 14 at 23:33
fny
864612
864612
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2998958%2fremoving-variable-with-big-p-value%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown