linux command line for large data set?

up vote
0
down vote

favorite

the problem: I a large excel file of data, there are over 1000 columns and over 40,000 rows. I have to identify where a given row has a value of >199 in any given cell. If any row does not have >199 in any given cell then I want to delete those rows. So that I am left with only rows where at least one cell has a value of >199.

I also have the same data file as a text file, so I was thinking that the best approach might be to use linux command line to do this problem rather than using the excel file (which is bulky to work with given the number of rows and columns). But I am a novice at linux and awk so I was looking for general advice of how to approach this issue? thanks so much

Thank you for your help.

Example image of data set below. Here I would want only the rows that have highlighted cells (because those are >200) but I can't just use the sort function or complicated if than statements because there are so many columns in my data set, so that is too time consuming...

enter image description here

edited Nov 27 at 1:30

asked Nov 27 at 0:45

Anna

I don't understand. Is this a question about Excel? Have you exported your data from Excel to a text file? Details, please. Please do not respond in comments; edit your question to make it clearer and more complete.
– Scott
Nov 27 at 1:02

Anna did describe her/his case clear. She/He has "the same data file as a text file" and wants to ask for an advice on how to approach parsing this txt file in linux command line and which combination of tools to use for that. Should piping with several commands be used, or some other approach. I am interested in this also. Thank you for the question.
– titus
Nov 27 at 1:28

1

@titus: I would expect somebody who has been on the site for eight years to know about edits and the revision history. If you look, you'll see that the "same data file as a text file" statement was added after I made my comment.
– Scott
Nov 27 at 3:09

add a comment |

up vote
0
down vote

favorite

Thank you for your help.

enter image description here

edited Nov 27 at 1:30

asked Nov 27 at 0:45

Anna

I don't understand. Is this a question about Excel? Have you exported your data from Excel to a text file? Details, please. Please do not respond in comments; edit your question to make it clearer and more complete.
– Scott
Nov 27 at 1:02

Anna did describe her/his case clear. She/He has "the same data file as a text file" and wants to ask for an advice on how to approach parsing this txt file in linux command line and which combination of tools to use for that. Should piping with several commands be used, or some other approach. I am interested in this also. Thank you for the question.
– titus
Nov 27 at 1:28

1

@titus: I would expect somebody who has been on the site for eight years to know about edits and the revision history. If you look, you'll see that the "same data file as a text file" statement was added after I made my comment.
– Scott
Nov 27 at 3:09

add a comment |

up vote
0
down vote

favorite

Thank you for your help.

enter image description here

edited Nov 27 at 1:30

asked Nov 27 at 0:45

Anna

Thank you for your help.

enter image description here

microsoft-excel-2010

edited Nov 27 at 1:30

asked Nov 27 at 0:45

Anna

edited Nov 27 at 1:30

asked Nov 27 at 0:45

Anna

edited Nov 27 at 1:30

asked Nov 27 at 0:45

Anna

asked Nov 27 at 0:45

Anna

asked Nov 27 at 0:45

Anna

I don't understand. Is this a question about Excel? Have you exported your data from Excel to a text file? Details, please. Please do not respond in comments; edit your question to make it clearer and more complete.
– Scott
Nov 27 at 1:02

Anna did describe her/his case clear. She/He has "the same data file as a text file" and wants to ask for an advice on how to approach parsing this txt file in linux command line and which combination of tools to use for that. Should piping with several commands be used, or some other approach. I am interested in this also. Thank you for the question.
– titus
Nov 27 at 1:28

1

@titus: I would expect somebody who has been on the site for eight years to know about edits and the revision history. If you look, you'll see that the "same data file as a text file" statement was added after I made my comment.
– Scott
Nov 27 at 3:09

add a comment |

I don't understand. Is this a question about Excel? Have you exported your data from Excel to a text file? Details, please. Please do not respond in comments; edit your question to make it clearer and more complete.
– Scott
Nov 27 at 1:02

Anna did describe her/his case clear. She/He has "the same data file as a text file" and wants to ask for an advice on how to approach parsing this txt file in linux command line and which combination of tools to use for that. Should piping with several commands be used, or some other approach. I am interested in this also. Thank you for the question.
– titus
Nov 27 at 1:28

1

@titus: I would expect somebody who has been on the site for eight years to know about edits and the revision history. If you look, you'll see that the "same data file as a text file" statement was added after I made my comment.
– Scott
Nov 27 at 3:09

I don't understand. Is this a question about Excel? Have you exported your data from Excel to a text file? Details, please. Please do not respond in comments; edit your question to make it clearer and more complete.
– Scott
Nov 27 at 1:02

Anna did describe her/his case clear. She/He has "the same data file as a text file" and wants to ask for an advice on how to approach parsing this txt file in linux command line and which combination of tools to use for that. Should piping with several commands be used, or some other approach. I am interested in this also. Thank you for the question.
– titus
Nov 27 at 1:28

@titus: I would expect somebody who has been on the site for eight years to know about edits and the revision history. If you look, you'll see that the "same data file as a text file" statement was added after I made my comment.
– Scott
Nov 27 at 3:09

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

Since you said "looking for general advice of how to approach this issue?" here is one approach:

If you know how to use Python, you could save the file as a comma separated file and run through the file simply by writing a small script and then use csv to do something with the data. You can use any operating system that supports python.

answered Nov 27 at 1:29

Elmo

22914

I am just learning python, so I haven't written a script before, what script would do this?
– Anna
Nov 27 at 1:34

You could, for example, use the "xlrd" module, in particular its sheet.nrows and sheet.row_values(n) methods. A small tutorial for using xlrd is described on geeksforgeeks.org/reading-excel-file-using-python
– Christoph Sommer
Nov 27 at 1:46

add a comment |

up vote
0
down vote

I don't have much general advice.
Specifically, I advise you to use this awk command:

awk '{

        over=0

        for (i=1; i<=NF; i++) if ($i > 199) over=1

        if (over) print

     }'

I created a small data file,
based on numbers from your file, and a few I made up on my own:

$ cat input

81      23      40

31      0       416     12

2       2       1

157     41      80      201

417     42      17



$ ./myscript input

31      0       416     12

157     41      80      201

417     42      17

To delete rows from your file, do

$ ./myscript input > input.new

$ mv input.new input

Notes:

For your own sake,
you should decide whether your requirement is > 199, > 200, ≥ 200, or what.

If you need to keep Row 1 (i.e., line 1, the header row), say so.

I haven't tested this on a large file.
awk shouldn't have any problem with a huge number of rows (lines).
A thousand columns (fields) might be an issue, but I doubt it.

answered Nov 27 at 3:05

Scott

15.5k113789

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1378599%2flinux-command-line-for-large-data-set%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

Since you said "looking for general advice of how to approach this issue?" here is one approach:

answered Nov 27 at 1:29

Elmo

22914

I am just learning python, so I haven't written a script before, what script would do this?
– Anna
Nov 27 at 1:34

You could, for example, use the "xlrd" module, in particular its sheet.nrows and sheet.row_values(n) methods. A small tutorial for using xlrd is described on geeksforgeeks.org/reading-excel-file-using-python
– Christoph Sommer
Nov 27 at 1:46

add a comment |

up vote
0
down vote

Since you said "looking for general advice of how to approach this issue?" here is one approach:

answered Nov 27 at 1:29

Elmo

22914

I am just learning python, so I haven't written a script before, what script would do this?
– Anna
Nov 27 at 1:34

You could, for example, use the "xlrd" module, in particular its sheet.nrows and sheet.row_values(n) methods. A small tutorial for using xlrd is described on geeksforgeeks.org/reading-excel-file-using-python
– Christoph Sommer
Nov 27 at 1:46

add a comment |

up vote
0
down vote

Since you said "looking for general advice of how to approach this issue?" here is one approach:

answered Nov 27 at 1:29

Elmo

22914

Since you said "looking for general advice of how to approach this issue?" here is one approach:

answered Nov 27 at 1:29

Elmo

22914

answered Nov 27 at 1:29

Elmo

22914

answered Nov 27 at 1:29

Elmo

22914

answered Nov 27 at 1:29

Elmo

22914

I am just learning python, so I haven't written a script before, what script would do this?
– Anna
Nov 27 at 1:34

You could, for example, use the "xlrd" module, in particular its sheet.nrows and sheet.row_values(n) methods. A small tutorial for using xlrd is described on geeksforgeeks.org/reading-excel-file-using-python
– Christoph Sommer
Nov 27 at 1:46

add a comment |

I am just learning python, so I haven't written a script before, what script would do this?
– Anna
Nov 27 at 1:34

You could, for example, use the "xlrd" module, in particular its sheet.nrows and sheet.row_values(n) methods. A small tutorial for using xlrd is described on geeksforgeeks.org/reading-excel-file-using-python
– Christoph Sommer
Nov 27 at 1:46

I am just learning python, so I haven't written a script before, what script would do this?
– Anna
Nov 27 at 1:34

You could, for example, use the "xlrd" module, in particular its sheet.nrows and sheet.row_values(n) methods. A small tutorial for using xlrd is described on geeksforgeeks.org/reading-excel-file-using-python
– Christoph Sommer
Nov 27 at 1:46

add a comment |

up vote
0
down vote

I don't have much general advice.
Specifically, I advise you to use this awk command:

awk '{

        over=0

        for (i=1; i<=NF; i++) if ($i > 199) over=1

        if (over) print

     }'

I created a small data file,
based on numbers from your file, and a few I made up on my own:

$ cat input

81      23      40

31      0       416     12

2       2       1

157     41      80      201

417     42      17



$ ./myscript input

31      0       416     12

157     41      80      201

417     42      17

To delete rows from your file, do

$ ./myscript input > input.new

$ mv input.new input

Notes:

For your own sake,
you should decide whether your requirement is > 199, > 200, ≥ 200, or what.

If you need to keep Row 1 (i.e., line 1, the header row), say so.

I haven't tested this on a large file.
awk shouldn't have any problem with a huge number of rows (lines).
A thousand columns (fields) might be an issue, but I doubt it.

answered Nov 27 at 3:05

Scott

15.5k113789

add a comment |

up vote
0
down vote

I don't have much general advice.
Specifically, I advise you to use this awk command:

awk '{

        over=0

        for (i=1; i<=NF; i++) if ($i > 199) over=1

        if (over) print

     }'

I created a small data file,
based on numbers from your file, and a few I made up on my own:

$ cat input

81      23      40

31      0       416     12

2       2       1

157     41      80      201

417     42      17



$ ./myscript input

31      0       416     12

157     41      80      201

417     42      17

To delete rows from your file, do

$ ./myscript input > input.new

$ mv input.new input

Notes:

For your own sake,
you should decide whether your requirement is > 199, > 200, ≥ 200, or what.

If you need to keep Row 1 (i.e., line 1, the header row), say so.

I haven't tested this on a large file.
awk shouldn't have any problem with a huge number of rows (lines).
A thousand columns (fields) might be an issue, but I doubt it.

answered Nov 27 at 3:05

Scott

15.5k113789

add a comment |

up vote
0
down vote

I don't have much general advice.
Specifically, I advise you to use this awk command:

awk '{

        over=0

        for (i=1; i<=NF; i++) if ($i > 199) over=1

        if (over) print

     }'

I created a small data file,
based on numbers from your file, and a few I made up on my own:

$ cat input

81      23      40

31      0       416     12

2       2       1

157     41      80      201

417     42      17



$ ./myscript input

31      0       416     12

157     41      80      201

417     42      17

To delete rows from your file, do

$ ./myscript input > input.new

$ mv input.new input

Notes:

For your own sake,
you should decide whether your requirement is > 199, > 200, ≥ 200, or what.

If you need to keep Row 1 (i.e., line 1, the header row), say so.

I haven't tested this on a large file.
awk shouldn't have any problem with a huge number of rows (lines).
A thousand columns (fields) might be an issue, but I doubt it.

answered Nov 27 at 3:05

Scott

15.5k113789

I don't have much general advice.
Specifically, I advise you to use this awk command:

awk '{

        over=0

        for (i=1; i<=NF; i++) if ($i > 199) over=1

        if (over) print

     }'

I created a small data file,
based on numbers from your file, and a few I made up on my own:

$ cat input

81      23      40

31      0       416     12

2       2       1

157     41      80      201

417     42      17



$ ./myscript input

31      0       416     12

157     41      80      201

417     42      17

To delete rows from your file, do

$ ./myscript input > input.new

$ mv input.new input

Notes:

For your own sake,
you should decide whether your requirement is > 199, > 200, ≥ 200, or what.

If you need to keep Row 1 (i.e., line 1, the header row), say so.

I haven't tested this on a large file.
awk shouldn't have any problem with a huge number of rows (lines).
A thousand columns (fields) might be an issue, but I doubt it.

answered Nov 27 at 3:05

Scott

15.5k113789

answered Nov 27 at 3:05

Scott

15.5k113789

answered Nov 27 at 3:05

Scott

15.5k113789

answered Nov 27 at 3:05

Scott

15.5k113789

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

5gwiPdK5575ze9dciFYZ8 jylcJIncQHBdEdc34v3 uxNSQOw3QgAvNoAYl1 0V,4QNGh3Z9bS 8FjaXMua893cHLtCA6Rx4P34,I Lywb

搜尋此網誌

Jtdylktuy