Why we do not accept the result of our simulation study as evidence of a limitation of one method
$begingroup$
I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.
My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?
New edit
In other words,
My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.
simulation fitting mixture
$endgroup$
add a comment |
$begingroup$
I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.
My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?
New edit
In other words,
My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.
simulation fitting mixture
$endgroup$
1
$begingroup$
You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
$endgroup$
– Xi'an
Jan 30 at 17:40
$begingroup$
@Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
$endgroup$
– Maryam
Jan 31 at 11:56
add a comment |
$begingroup$
I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.
My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?
New edit
In other words,
My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.
simulation fitting mixture
$endgroup$
I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.
My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?
New edit
In other words,
My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.
simulation fitting mixture
simulation fitting mixture
edited Jan 29 at 20:48
Robert Long
10.2k22549
10.2k22549
asked Jan 28 at 7:02
MaryamMaryam
18212
18212
1
$begingroup$
You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
$endgroup$
– Xi'an
Jan 30 at 17:40
$begingroup$
@Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
$endgroup$
– Maryam
Jan 31 at 11:56
add a comment |
1
$begingroup$
You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
$endgroup$
– Xi'an
Jan 30 at 17:40
$begingroup$
@Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
$endgroup$
– Maryam
Jan 31 at 11:56
1
1
$begingroup$
You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
$endgroup$
– Xi'an
Jan 30 at 17:40
$begingroup$
You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
$endgroup$
– Xi'an
Jan 30 at 17:40
$begingroup$
@Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
$endgroup$
– Maryam
Jan 31 at 11:56
$begingroup$
@Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
$endgroup$
– Maryam
Jan 31 at 11:56
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:
- Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.
- Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.
- The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.
$endgroup$
$begingroup$
Thank you so much for your answer. I appreciate it. I have edited my question.
$endgroup$
– Maryam
Jan 28 at 7:40
1
$begingroup$
I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
$endgroup$
– Björn
Jan 28 at 7:42
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389476%2fwhy-we-do-not-accept-the-result-of-our-simulation-study-as-evidence-of-a-limitat%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:
- Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.
- Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.
- The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.
$endgroup$
$begingroup$
Thank you so much for your answer. I appreciate it. I have edited my question.
$endgroup$
– Maryam
Jan 28 at 7:40
1
$begingroup$
I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
$endgroup$
– Björn
Jan 28 at 7:42
add a comment |
$begingroup$
Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:
- Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.
- Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.
- The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.
$endgroup$
$begingroup$
Thank you so much for your answer. I appreciate it. I have edited my question.
$endgroup$
– Maryam
Jan 28 at 7:40
1
$begingroup$
I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
$endgroup$
– Björn
Jan 28 at 7:42
add a comment |
$begingroup$
Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:
- Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.
- Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.
- The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.
$endgroup$
Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:
- Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.
- Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.
- The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.
answered Jan 28 at 7:26
BjörnBjörn
10.9k11140
10.9k11140
$begingroup$
Thank you so much for your answer. I appreciate it. I have edited my question.
$endgroup$
– Maryam
Jan 28 at 7:40
1
$begingroup$
I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
$endgroup$
– Björn
Jan 28 at 7:42
add a comment |
$begingroup$
Thank you so much for your answer. I appreciate it. I have edited my question.
$endgroup$
– Maryam
Jan 28 at 7:40
1
$begingroup$
I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
$endgroup$
– Björn
Jan 28 at 7:42
$begingroup$
Thank you so much for your answer. I appreciate it. I have edited my question.
$endgroup$
– Maryam
Jan 28 at 7:40
$begingroup$
Thank you so much for your answer. I appreciate it. I have edited my question.
$endgroup$
– Maryam
Jan 28 at 7:40
1
1
$begingroup$
I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
$endgroup$
– Björn
Jan 28 at 7:42
$begingroup$
I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
$endgroup$
– Björn
Jan 28 at 7:42
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389476%2fwhy-we-do-not-accept-the-result-of-our-simulation-study-as-evidence-of-a-limitat%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
$begingroup$
You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
$endgroup$
– Xi'an
Jan 30 at 17:40
$begingroup$
@Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
$endgroup$
– Maryam
Jan 31 at 11:56