Making a dataframe of parent IDs
$begingroup$
I am quite new to scripting, and here is my first code I wrote. The goal of the code is to extract a parent ID based on a child ID. I would appreciate some constructive criticism and aggressive checkup. The codes works and does what it's supposed to do. But how can I make it more pythonic? more passionate?
import codecs
import numpy as np
import pandas as pd
def import_file(path):
'''imports the .csv files as a pandas dataframe
Args:
path (csv file): Takes in the path for the .csv file
Returns:
Returns a pandas dataframe
'''
with codecs.open(path, "r", encoding='utf-8', errors='ignore') as fdata:
df = pd.read_csv(fdata)
return df
def appends_address_before_name(file):
'''Appends the address before the name ID name
Returns:
Returns the file, with address appened to column name.
'''
file['ID'] = [address + str(col) for col in file['ID']]
return file
def create_parent_name(file, column_name: str):
'''This will create a parent name based on the ID column
Args:
file: takes in the dataFrame created from appends_address_before_name function
column_name: Takes in the column name where the parent name
will be extracted from.The logic is to split it on the last dot.
[[parentname].[+ childname]]
Returns:
Returns a pandas dataframe with a new column called parentID
'''
file['parentID'] = [
x.rsplit('.', 1)[0] if '.' in x else x[:-1] for x in file[column_name]
]
return file
address = 'New_Jersey_'
file_1 = import_file(r'C:humans.csv')
file_2= appends_address_before_name(file=file_1)
file_3= create_parent_name(file=file_2 , column_name = 'ID')
print(file_3)
The input CSV is a column of values separated with decimal places like the following:
ID
99.99.9
100.42.3
Example output
parentID
New_Jersey_99.99
New_Jersey_100.42
Furthermore, I feel like the way I am passing variables between the functions at the end of the code seems quite basic and terrible. What can I improve in the above code and how can I improve it ?
A screenshot of the code working
python beginner pandas
$endgroup$
add a comment |
$begingroup$
I am quite new to scripting, and here is my first code I wrote. The goal of the code is to extract a parent ID based on a child ID. I would appreciate some constructive criticism and aggressive checkup. The codes works and does what it's supposed to do. But how can I make it more pythonic? more passionate?
import codecs
import numpy as np
import pandas as pd
def import_file(path):
'''imports the .csv files as a pandas dataframe
Args:
path (csv file): Takes in the path for the .csv file
Returns:
Returns a pandas dataframe
'''
with codecs.open(path, "r", encoding='utf-8', errors='ignore') as fdata:
df = pd.read_csv(fdata)
return df
def appends_address_before_name(file):
'''Appends the address before the name ID name
Returns:
Returns the file, with address appened to column name.
'''
file['ID'] = [address + str(col) for col in file['ID']]
return file
def create_parent_name(file, column_name: str):
'''This will create a parent name based on the ID column
Args:
file: takes in the dataFrame created from appends_address_before_name function
column_name: Takes in the column name where the parent name
will be extracted from.The logic is to split it on the last dot.
[[parentname].[+ childname]]
Returns:
Returns a pandas dataframe with a new column called parentID
'''
file['parentID'] = [
x.rsplit('.', 1)[0] if '.' in x else x[:-1] for x in file[column_name]
]
return file
address = 'New_Jersey_'
file_1 = import_file(r'C:humans.csv')
file_2= appends_address_before_name(file=file_1)
file_3= create_parent_name(file=file_2 , column_name = 'ID')
print(file_3)
The input CSV is a column of values separated with decimal places like the following:
ID
99.99.9
100.42.3
Example output
parentID
New_Jersey_99.99
New_Jersey_100.42
Furthermore, I feel like the way I am passing variables between the functions at the end of the code seems quite basic and terrible. What can I improve in the above code and how can I improve it ?
A screenshot of the code working
python beginner pandas
$endgroup$
$begingroup$
@200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
$endgroup$
– Matthew
Dec 7 '18 at 13:57
1
$begingroup$
Thanks for adding the explanation. Downvote retracted.
$endgroup$
– 200_success
Dec 7 '18 at 14:02
$begingroup$
Can you post a sample from your csv, including header?
$endgroup$
– Graipher
Dec 7 '18 at 14:11
$begingroup$
What is the reason behind dropping the last character if there is no'.'
in the ID? It seems to result in weird things like'NewJersey_"
.
$endgroup$
– Graipher
Dec 7 '18 at 14:25
$begingroup$
@Graipher you are right, the parentID for this one will be justNewJersey
, I added an extra underscore in theaddress
, thats wrong.
$endgroup$
– Matthew
Dec 7 '18 at 14:27
add a comment |
$begingroup$
I am quite new to scripting, and here is my first code I wrote. The goal of the code is to extract a parent ID based on a child ID. I would appreciate some constructive criticism and aggressive checkup. The codes works and does what it's supposed to do. But how can I make it more pythonic? more passionate?
import codecs
import numpy as np
import pandas as pd
def import_file(path):
'''imports the .csv files as a pandas dataframe
Args:
path (csv file): Takes in the path for the .csv file
Returns:
Returns a pandas dataframe
'''
with codecs.open(path, "r", encoding='utf-8', errors='ignore') as fdata:
df = pd.read_csv(fdata)
return df
def appends_address_before_name(file):
'''Appends the address before the name ID name
Returns:
Returns the file, with address appened to column name.
'''
file['ID'] = [address + str(col) for col in file['ID']]
return file
def create_parent_name(file, column_name: str):
'''This will create a parent name based on the ID column
Args:
file: takes in the dataFrame created from appends_address_before_name function
column_name: Takes in the column name where the parent name
will be extracted from.The logic is to split it on the last dot.
[[parentname].[+ childname]]
Returns:
Returns a pandas dataframe with a new column called parentID
'''
file['parentID'] = [
x.rsplit('.', 1)[0] if '.' in x else x[:-1] for x in file[column_name]
]
return file
address = 'New_Jersey_'
file_1 = import_file(r'C:humans.csv')
file_2= appends_address_before_name(file=file_1)
file_3= create_parent_name(file=file_2 , column_name = 'ID')
print(file_3)
The input CSV is a column of values separated with decimal places like the following:
ID
99.99.9
100.42.3
Example output
parentID
New_Jersey_99.99
New_Jersey_100.42
Furthermore, I feel like the way I am passing variables between the functions at the end of the code seems quite basic and terrible. What can I improve in the above code and how can I improve it ?
A screenshot of the code working
python beginner pandas
$endgroup$
I am quite new to scripting, and here is my first code I wrote. The goal of the code is to extract a parent ID based on a child ID. I would appreciate some constructive criticism and aggressive checkup. The codes works and does what it's supposed to do. But how can I make it more pythonic? more passionate?
import codecs
import numpy as np
import pandas as pd
def import_file(path):
'''imports the .csv files as a pandas dataframe
Args:
path (csv file): Takes in the path for the .csv file
Returns:
Returns a pandas dataframe
'''
with codecs.open(path, "r", encoding='utf-8', errors='ignore') as fdata:
df = pd.read_csv(fdata)
return df
def appends_address_before_name(file):
'''Appends the address before the name ID name
Returns:
Returns the file, with address appened to column name.
'''
file['ID'] = [address + str(col) for col in file['ID']]
return file
def create_parent_name(file, column_name: str):
'''This will create a parent name based on the ID column
Args:
file: takes in the dataFrame created from appends_address_before_name function
column_name: Takes in the column name where the parent name
will be extracted from.The logic is to split it on the last dot.
[[parentname].[+ childname]]
Returns:
Returns a pandas dataframe with a new column called parentID
'''
file['parentID'] = [
x.rsplit('.', 1)[0] if '.' in x else x[:-1] for x in file[column_name]
]
return file
address = 'New_Jersey_'
file_1 = import_file(r'C:humans.csv')
file_2= appends_address_before_name(file=file_1)
file_3= create_parent_name(file=file_2 , column_name = 'ID')
print(file_3)
The input CSV is a column of values separated with decimal places like the following:
ID
99.99.9
100.42.3
Example output
parentID
New_Jersey_99.99
New_Jersey_100.42
Furthermore, I feel like the way I am passing variables between the functions at the end of the code seems quite basic and terrible. What can I improve in the above code and how can I improve it ?
A screenshot of the code working
python beginner pandas
python beginner pandas
edited Dec 7 '18 at 14:23
Matthew
asked Dec 7 '18 at 13:03
MatthewMatthew
284
284
$begingroup$
@200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
$endgroup$
– Matthew
Dec 7 '18 at 13:57
1
$begingroup$
Thanks for adding the explanation. Downvote retracted.
$endgroup$
– 200_success
Dec 7 '18 at 14:02
$begingroup$
Can you post a sample from your csv, including header?
$endgroup$
– Graipher
Dec 7 '18 at 14:11
$begingroup$
What is the reason behind dropping the last character if there is no'.'
in the ID? It seems to result in weird things like'NewJersey_"
.
$endgroup$
– Graipher
Dec 7 '18 at 14:25
$begingroup$
@Graipher you are right, the parentID for this one will be justNewJersey
, I added an extra underscore in theaddress
, thats wrong.
$endgroup$
– Matthew
Dec 7 '18 at 14:27
add a comment |
$begingroup$
@200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
$endgroup$
– Matthew
Dec 7 '18 at 13:57
1
$begingroup$
Thanks for adding the explanation. Downvote retracted.
$endgroup$
– 200_success
Dec 7 '18 at 14:02
$begingroup$
Can you post a sample from your csv, including header?
$endgroup$
– Graipher
Dec 7 '18 at 14:11
$begingroup$
What is the reason behind dropping the last character if there is no'.'
in the ID? It seems to result in weird things like'NewJersey_"
.
$endgroup$
– Graipher
Dec 7 '18 at 14:25
$begingroup$
@Graipher you are right, the parentID for this one will be justNewJersey
, I added an extra underscore in theaddress
, thats wrong.
$endgroup$
– Matthew
Dec 7 '18 at 14:27
$begingroup$
@200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
$endgroup$
– Matthew
Dec 7 '18 at 13:57
$begingroup$
@200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
$endgroup$
– Matthew
Dec 7 '18 at 13:57
1
1
$begingroup$
Thanks for adding the explanation. Downvote retracted.
$endgroup$
– 200_success
Dec 7 '18 at 14:02
$begingroup$
Thanks for adding the explanation. Downvote retracted.
$endgroup$
– 200_success
Dec 7 '18 at 14:02
$begingroup$
Can you post a sample from your csv, including header?
$endgroup$
– Graipher
Dec 7 '18 at 14:11
$begingroup$
Can you post a sample from your csv, including header?
$endgroup$
– Graipher
Dec 7 '18 at 14:11
$begingroup$
What is the reason behind dropping the last character if there is no
'.'
in the ID? It seems to result in weird things like 'NewJersey_"
.$endgroup$
– Graipher
Dec 7 '18 at 14:25
$begingroup$
What is the reason behind dropping the last character if there is no
'.'
in the ID? It seems to result in weird things like 'NewJersey_"
.$endgroup$
– Graipher
Dec 7 '18 at 14:25
$begingroup$
@Graipher you are right, the parentID for this one will be just
NewJersey
, I added an extra underscore in the address
, thats wrong.$endgroup$
– Matthew
Dec 7 '18 at 14:27
$begingroup$
@Graipher you are right, the parentID for this one will be just
NewJersey
, I added an extra underscore in the address
, thats wrong.$endgroup$
– Matthew
Dec 7 '18 at 14:27
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1
, file_2
and file_3
are all identical.
The usual convention is to return None
(implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.
Besides that, pandas
is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str
. You can find some examples in the documentation.
Your appends_address_before_name
function could be simplified a lot because string addition is vectorized:
def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]
And your create_parent_name
function could be:
def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")
With a csv file like this:
ID
99.99.9
100.42.3
101
This produces
df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_
Note that the order of the calls has changed, so that ids without a .
are handled correctly.
As for general structure:
- Seeing docstrings is very nice (I omitted them here for brevity)
- Python has an official style-guide, PEP8. It recommends writing
x = some_thing(a=3)
, so surround equal signs with spaces when assigning but not when setting keyword arguments. - You should wrap the main calling code in a
if __name__ == "__main__"
guard.
$endgroup$
$begingroup$
can you kindly elaborate more onobject they receive or if the return a modified object.
, where in my code am I returning or modifiying objective ?
$endgroup$
– Matthew
Dec 7 '18 at 14:28
$begingroup$
so you removed the variables, i.e.file_2
as infile_2= appends_address_before_name(file=file_1)
, as you modified the original file rather than returning it?
$endgroup$
– Matthew
Dec 7 '18 at 14:29
2
$begingroup$
@Matthew: When you dofile["ID"] = ...
you modify thefile
object in place. Afterwards your returnfile
, which is still the same object, but modified.
$endgroup$
– Graipher
Dec 7 '18 at 14:30
$begingroup$
the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
$endgroup$
– Matthew
Dec 7 '18 at 14:38
$begingroup$
@Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
$endgroup$
– Graipher
Dec 7 '18 at 14:41
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209208%2fmaking-a-dataframe-of-parent-ids%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1
, file_2
and file_3
are all identical.
The usual convention is to return None
(implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.
Besides that, pandas
is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str
. You can find some examples in the documentation.
Your appends_address_before_name
function could be simplified a lot because string addition is vectorized:
def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]
And your create_parent_name
function could be:
def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")
With a csv file like this:
ID
99.99.9
100.42.3
101
This produces
df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_
Note that the order of the calls has changed, so that ids without a .
are handled correctly.
As for general structure:
- Seeing docstrings is very nice (I omitted them here for brevity)
- Python has an official style-guide, PEP8. It recommends writing
x = some_thing(a=3)
, so surround equal signs with spaces when assigning but not when setting keyword arguments. - You should wrap the main calling code in a
if __name__ == "__main__"
guard.
$endgroup$
$begingroup$
can you kindly elaborate more onobject they receive or if the return a modified object.
, where in my code am I returning or modifiying objective ?
$endgroup$
– Matthew
Dec 7 '18 at 14:28
$begingroup$
so you removed the variables, i.e.file_2
as infile_2= appends_address_before_name(file=file_1)
, as you modified the original file rather than returning it?
$endgroup$
– Matthew
Dec 7 '18 at 14:29
2
$begingroup$
@Matthew: When you dofile["ID"] = ...
you modify thefile
object in place. Afterwards your returnfile
, which is still the same object, but modified.
$endgroup$
– Graipher
Dec 7 '18 at 14:30
$begingroup$
the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
$endgroup$
– Matthew
Dec 7 '18 at 14:38
$begingroup$
@Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
$endgroup$
– Graipher
Dec 7 '18 at 14:41
add a comment |
$begingroup$
You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1
, file_2
and file_3
are all identical.
The usual convention is to return None
(implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.
Besides that, pandas
is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str
. You can find some examples in the documentation.
Your appends_address_before_name
function could be simplified a lot because string addition is vectorized:
def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]
And your create_parent_name
function could be:
def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")
With a csv file like this:
ID
99.99.9
100.42.3
101
This produces
df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_
Note that the order of the calls has changed, so that ids without a .
are handled correctly.
As for general structure:
- Seeing docstrings is very nice (I omitted them here for brevity)
- Python has an official style-guide, PEP8. It recommends writing
x = some_thing(a=3)
, so surround equal signs with spaces when assigning but not when setting keyword arguments. - You should wrap the main calling code in a
if __name__ == "__main__"
guard.
$endgroup$
$begingroup$
can you kindly elaborate more onobject they receive or if the return a modified object.
, where in my code am I returning or modifiying objective ?
$endgroup$
– Matthew
Dec 7 '18 at 14:28
$begingroup$
so you removed the variables, i.e.file_2
as infile_2= appends_address_before_name(file=file_1)
, as you modified the original file rather than returning it?
$endgroup$
– Matthew
Dec 7 '18 at 14:29
2
$begingroup$
@Matthew: When you dofile["ID"] = ...
you modify thefile
object in place. Afterwards your returnfile
, which is still the same object, but modified.
$endgroup$
– Graipher
Dec 7 '18 at 14:30
$begingroup$
the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
$endgroup$
– Matthew
Dec 7 '18 at 14:38
$begingroup$
@Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
$endgroup$
– Graipher
Dec 7 '18 at 14:41
add a comment |
$begingroup$
You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1
, file_2
and file_3
are all identical.
The usual convention is to return None
(implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.
Besides that, pandas
is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str
. You can find some examples in the documentation.
Your appends_address_before_name
function could be simplified a lot because string addition is vectorized:
def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]
And your create_parent_name
function could be:
def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")
With a csv file like this:
ID
99.99.9
100.42.3
101
This produces
df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_
Note that the order of the calls has changed, so that ids without a .
are handled correctly.
As for general structure:
- Seeing docstrings is very nice (I omitted them here for brevity)
- Python has an official style-guide, PEP8. It recommends writing
x = some_thing(a=3)
, so surround equal signs with spaces when assigning but not when setting keyword arguments. - You should wrap the main calling code in a
if __name__ == "__main__"
guard.
$endgroup$
You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1
, file_2
and file_3
are all identical.
The usual convention is to return None
(implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.
Besides that, pandas
is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str
. You can find some examples in the documentation.
Your appends_address_before_name
function could be simplified a lot because string addition is vectorized:
def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]
And your create_parent_name
function could be:
def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")
With a csv file like this:
ID
99.99.9
100.42.3
101
This produces
df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_
Note that the order of the calls has changed, so that ids without a .
are handled correctly.
As for general structure:
- Seeing docstrings is very nice (I omitted them here for brevity)
- Python has an official style-guide, PEP8. It recommends writing
x = some_thing(a=3)
, so surround equal signs with spaces when assigning but not when setting keyword arguments. - You should wrap the main calling code in a
if __name__ == "__main__"
guard.
edited Dec 7 '18 at 15:47
answered Dec 7 '18 at 14:19
GraipherGraipher
24.1k53586
24.1k53586
$begingroup$
can you kindly elaborate more onobject they receive or if the return a modified object.
, where in my code am I returning or modifiying objective ?
$endgroup$
– Matthew
Dec 7 '18 at 14:28
$begingroup$
so you removed the variables, i.e.file_2
as infile_2= appends_address_before_name(file=file_1)
, as you modified the original file rather than returning it?
$endgroup$
– Matthew
Dec 7 '18 at 14:29
2
$begingroup$
@Matthew: When you dofile["ID"] = ...
you modify thefile
object in place. Afterwards your returnfile
, which is still the same object, but modified.
$endgroup$
– Graipher
Dec 7 '18 at 14:30
$begingroup$
the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
$endgroup$
– Matthew
Dec 7 '18 at 14:38
$begingroup$
@Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
$endgroup$
– Graipher
Dec 7 '18 at 14:41
add a comment |
$begingroup$
can you kindly elaborate more onobject they receive or if the return a modified object.
, where in my code am I returning or modifiying objective ?
$endgroup$
– Matthew
Dec 7 '18 at 14:28
$begingroup$
so you removed the variables, i.e.file_2
as infile_2= appends_address_before_name(file=file_1)
, as you modified the original file rather than returning it?
$endgroup$
– Matthew
Dec 7 '18 at 14:29
2
$begingroup$
@Matthew: When you dofile["ID"] = ...
you modify thefile
object in place. Afterwards your returnfile
, which is still the same object, but modified.
$endgroup$
– Graipher
Dec 7 '18 at 14:30
$begingroup$
the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
$endgroup$
– Matthew
Dec 7 '18 at 14:38
$begingroup$
@Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
$endgroup$
– Graipher
Dec 7 '18 at 14:41
$begingroup$
can you kindly elaborate more on
object they receive or if the return a modified object.
, where in my code am I returning or modifiying objective ?$endgroup$
– Matthew
Dec 7 '18 at 14:28
$begingroup$
can you kindly elaborate more on
object they receive or if the return a modified object.
, where in my code am I returning or modifiying objective ?$endgroup$
– Matthew
Dec 7 '18 at 14:28
$begingroup$
so you removed the variables, i.e.
file_2
as in file_2= appends_address_before_name(file=file_1)
, as you modified the original file rather than returning it?$endgroup$
– Matthew
Dec 7 '18 at 14:29
$begingroup$
so you removed the variables, i.e.
file_2
as in file_2= appends_address_before_name(file=file_1)
, as you modified the original file rather than returning it?$endgroup$
– Matthew
Dec 7 '18 at 14:29
2
2
$begingroup$
@Matthew: When you do
file["ID"] = ...
you modify the file
object in place. Afterwards your return file
, which is still the same object, but modified.$endgroup$
– Graipher
Dec 7 '18 at 14:30
$begingroup$
@Matthew: When you do
file["ID"] = ...
you modify the file
object in place. Afterwards your return file
, which is still the same object, but modified.$endgroup$
– Graipher
Dec 7 '18 at 14:30
$begingroup$
the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
$endgroup$
– Matthew
Dec 7 '18 at 14:38
$begingroup$
the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
$endgroup$
– Matthew
Dec 7 '18 at 14:38
$begingroup$
@Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
$endgroup$
– Graipher
Dec 7 '18 at 14:41
$begingroup$
@Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
$endgroup$
– Graipher
Dec 7 '18 at 14:41
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209208%2fmaking-a-dataframe-of-parent-ids%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
@200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
$endgroup$
– Matthew
Dec 7 '18 at 13:57
1
$begingroup$
Thanks for adding the explanation. Downvote retracted.
$endgroup$
– 200_success
Dec 7 '18 at 14:02
$begingroup$
Can you post a sample from your csv, including header?
$endgroup$
– Graipher
Dec 7 '18 at 14:11
$begingroup$
What is the reason behind dropping the last character if there is no
'.'
in the ID? It seems to result in weird things like'NewJersey_"
.$endgroup$
– Graipher
Dec 7 '18 at 14:25
$begingroup$
@Graipher you are right, the parentID for this one will be just
NewJersey
, I added an extra underscore in theaddress
, thats wrong.$endgroup$
– Matthew
Dec 7 '18 at 14:27