Why we do not accept the result of our simulation study as evidence of a limitation of one method












4












$begingroup$


I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.



My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?



New edit



In other words,



My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
    $endgroup$
    – Xi'an
    Jan 30 at 17:40












  • $begingroup$
    @Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
    $endgroup$
    – Maryam
    Jan 31 at 11:56
















4












$begingroup$


I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.



My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?



New edit



In other words,



My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.










share|cite|improve this question











$endgroup$








  • 1




    $begingroup$
    You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
    $endgroup$
    – Xi'an
    Jan 30 at 17:40












  • $begingroup$
    @Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
    $endgroup$
    – Maryam
    Jan 31 at 11:56














4












4








4


1



$begingroup$


I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.



My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?



New edit



In other words,



My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.










share|cite|improve this question











$endgroup$




I am doing a mixture model. I have established a new method using EM-algorithm. I have simulated data from a mixture model. Then, I applied my new method to the data. The result is very satisfying. Then, for comparison reason, the non-mixture model shows inaccurate results, as accepted. I have used this as evidence that the non-mixture model (for a specific area) is not able to deal with mixture dependency. Someone told me that is not surprising as the data is a mixture data. I already knew that but to make the reader aware of the importance of the mixture model and how the non-mixture fails in these cases. Then, he asked me to applied both non-mixture and mixture models on real data and see the results. The data I have used is general (I just would like to test the model on it and have no experiment information about it). I read that for real data, we should understand it or have a strong background on it, otherwise the comparison is not fair. For example, suppose that I fit a model on a data where I really do not know it very well. Suppose further that the first model (model A (non-mixture) fit different distribution (say arbitrary Gaussian models) to the data, while the mixture model (model B) fit only specific mixture Gaussian model. Then, it may possible that model A outperforms model B. However, if we have a great knowledge of our data, then fit the most appropriate mixture model, then, the possibility that model B fits the data better than model A is high.



My question is why we do not trust the simulation study to illustrates our problem (if we have not interested in specific data) or have data with no experiment knowledge? In other words, as I need to illustrate one point, then why do simulation data is not enough?



New edit



In other words,



My idea is, is it fair to compare model A with model B where I do not have enough information or knowledge of the data at hand? Which may make model A fits the data better than model B (due to poor knowledge of the data). I think, for this case, the fair comparison is can only hold if we have a great knowledge of the data and therefore fit the most appropriate model to it before the comparison. That is, to compare two models on real data, I should have enough knowledge about the data. Otherwise, if I fit wrong model, even if it mixture model, to the real mixture data, then, the non-mixture may fit the data better than the mixture model just because I fit the wrong mixture model? Is that correct? Therefore, the non-mixture model even shows a better model fit than the mixture model, still, give me wrong fits (because the data is a mixture). Hence, in this case, my simulation data is good to illustrate the limitation of the non-mixture model.







simulation fitting mixture






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 29 at 20:48









Robert Long

10.2k22549




10.2k22549










asked Jan 28 at 7:02









MaryamMaryam

18212




18212








  • 1




    $begingroup$
    You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
    $endgroup$
    – Xi'an
    Jan 30 at 17:40












  • $begingroup$
    @Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
    $endgroup$
    – Maryam
    Jan 31 at 11:56














  • 1




    $begingroup$
    You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
    $endgroup$
    – Xi'an
    Jan 30 at 17:40












  • $begingroup$
    @Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
    $endgroup$
    – Maryam
    Jan 31 at 11:56








1




1




$begingroup$
You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
$endgroup$
– Xi'an
Jan 30 at 17:40






$begingroup$
You cannot trust one simulation experiment, but need many repeated simulation experiments to show that the pattern persists across repetitions and variations of the parameters of the model you use to simulate.
$endgroup$
– Xi'an
Jan 30 at 17:40














$begingroup$
@Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
$endgroup$
– Maryam
Jan 31 at 11:56




$begingroup$
@Xi'an Thank you so much for your comments. I always trust you. If you meant that I have to apply my model to a repeated simulated data set, then, yes that what I have done. If you meant applied my model on different simulation scenario, then, yes I have done that. My question is, if I have a real data which I do not understand it very well, and fit wrong mixture model to it, with a fixed number of wrong components, then I will have a wrong result and hence, the non-mixture model may provide the better model fit. Is that possible?
$endgroup$
– Maryam
Jan 31 at 11:56










1 Answer
1






active

oldest

votes


















2












$begingroup$

Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




  1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

  2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

  3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Thank you so much for your answer. I appreciate it. I have edited my question.
    $endgroup$
    – Maryam
    Jan 28 at 7:40






  • 1




    $begingroup$
    I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
    $endgroup$
    – Björn
    Jan 28 at 7:42











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389476%2fwhy-we-do-not-accept-the-result-of-our-simulation-study-as-evidence-of-a-limitat%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




  1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

  2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

  3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Thank you so much for your answer. I appreciate it. I have edited my question.
    $endgroup$
    – Maryam
    Jan 28 at 7:40






  • 1




    $begingroup$
    I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
    $endgroup$
    – Björn
    Jan 28 at 7:42
















2












$begingroup$

Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




  1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

  2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

  3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    Thank you so much for your answer. I appreciate it. I have edited my question.
    $endgroup$
    – Maryam
    Jan 28 at 7:40






  • 1




    $begingroup$
    I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
    $endgroup$
    – Björn
    Jan 28 at 7:42














2












2








2





$begingroup$

Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




  1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

  2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

  3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.






share|cite|improve this answer









$endgroup$



Simulation studies that show that it is great when the data generating model and the analysis model are the same are very common. What people really want to see is more general:




  1. Model performing well when the data generating merchanism has all the complexity of real life. There is a lot of judgement here, but some other aspect of the data generating mechanism may have a much bigger impact than others. Simulations are actually great for exploring that, but are too often poorly done.

  2. Don't just knock down a strawman, but all the reasonable / frequently used methods. E.g. adjustment for covariates might make omitting a random effect less important.

  3. The differences in performance need to be striking enough that it truly matters in practice. A good example can also help here to illustrate that one can get strikingly different conclusions.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Jan 28 at 7:26









BjörnBjörn

10.9k11140




10.9k11140












  • $begingroup$
    Thank you so much for your answer. I appreciate it. I have edited my question.
    $endgroup$
    – Maryam
    Jan 28 at 7:40






  • 1




    $begingroup$
    I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
    $endgroup$
    – Björn
    Jan 28 at 7:42


















  • $begingroup$
    Thank you so much for your answer. I appreciate it. I have edited my question.
    $endgroup$
    – Maryam
    Jan 28 at 7:40






  • 1




    $begingroup$
    I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
    $endgroup$
    – Björn
    Jan 28 at 7:42
















$begingroup$
Thank you so much for your answer. I appreciate it. I have edited my question.
$endgroup$
– Maryam
Jan 28 at 7:40




$begingroup$
Thank you so much for your answer. I appreciate it. I have edited my question.
$endgroup$
– Maryam
Jan 28 at 7:40




1




1




$begingroup$
I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
$endgroup$
– Björn
Jan 28 at 7:42




$begingroup$
I guess part of the problem is: why should anyone care that omitting a random effect on a model matters, if there is one in the data generating process, if they have no idea whether this can realistically occur in practice. Also, does the form of the random effect matter etc.
$endgroup$
– Björn
Jan 28 at 7:42


















draft saved

draft discarded




















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389476%2fwhy-we-do-not-accept-the-result-of-our-simulation-study-as-evidence-of-a-limitat%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Probability when a professor distributes a quiz and homework assignment to a class of n students.

Aardman Animations

Are they similar matrix