Making a dataframe of parent IDs












5












$begingroup$


I am quite new to scripting, and here is my first code I wrote. The goal of the code is to extract a parent ID based on a child ID. I would appreciate some constructive criticism and aggressive checkup. The codes works and does what it's supposed to do. But how can I make it more pythonic? more passionate?



import codecs

import numpy as np
import pandas as pd


def import_file(path):
'''imports the .csv files as a pandas dataframe
Args:
path (csv file): Takes in the path for the .csv file

Returns:
Returns a pandas dataframe
'''
with codecs.open(path, "r", encoding='utf-8', errors='ignore') as fdata:
df = pd.read_csv(fdata)
return df


def appends_address_before_name(file):
'''Appends the address before the name ID name
Returns:
Returns the file, with address appened to column name.
'''
file['ID'] = [address + str(col) for col in file['ID']]
return file


def create_parent_name(file, column_name: str):
'''This will create a parent name based on the ID column

Args:
file: takes in the dataFrame created from appends_address_before_name function

column_name: Takes in the column name where the parent name
will be extracted from.The logic is to split it on the last dot.
[[parentname].[+ childname]]

Returns:
Returns a pandas dataframe with a new column called parentID
'''
file['parentID'] = [
x.rsplit('.', 1)[0] if '.' in x else x[:-1] for x in file[column_name]
]

return file


address = 'New_Jersey_'
file_1 = import_file(r'C:humans.csv')
file_2= appends_address_before_name(file=file_1)
file_3= create_parent_name(file=file_2 , column_name = 'ID')

print(file_3)


The input CSV is a column of values separated with decimal places like the following:



ID

99.99.9
100.42.3


Example output



parentID

New_Jersey_99.99
New_Jersey_100.42


Furthermore, I feel like the way I am passing variables between the functions at the end of the code seems quite basic and terrible. What can I improve in the above code and how can I improve it ?



A screenshot of the code working



enter image description here










share|improve this question











$endgroup$












  • $begingroup$
    @200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
    $endgroup$
    – Matthew
    Dec 7 '18 at 13:57






  • 1




    $begingroup$
    Thanks for adding the explanation. Downvote retracted.
    $endgroup$
    – 200_success
    Dec 7 '18 at 14:02










  • $begingroup$
    Can you post a sample from your csv, including header?
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:11










  • $begingroup$
    What is the reason behind dropping the last character if there is no '.' in the ID? It seems to result in weird things like 'NewJersey_".
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:25










  • $begingroup$
    @Graipher you are right, the parentID for this one will be just NewJersey, I added an extra underscore in the address , thats wrong.
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:27
















5












$begingroup$


I am quite new to scripting, and here is my first code I wrote. The goal of the code is to extract a parent ID based on a child ID. I would appreciate some constructive criticism and aggressive checkup. The codes works and does what it's supposed to do. But how can I make it more pythonic? more passionate?



import codecs

import numpy as np
import pandas as pd


def import_file(path):
'''imports the .csv files as a pandas dataframe
Args:
path (csv file): Takes in the path for the .csv file

Returns:
Returns a pandas dataframe
'''
with codecs.open(path, "r", encoding='utf-8', errors='ignore') as fdata:
df = pd.read_csv(fdata)
return df


def appends_address_before_name(file):
'''Appends the address before the name ID name
Returns:
Returns the file, with address appened to column name.
'''
file['ID'] = [address + str(col) for col in file['ID']]
return file


def create_parent_name(file, column_name: str):
'''This will create a parent name based on the ID column

Args:
file: takes in the dataFrame created from appends_address_before_name function

column_name: Takes in the column name where the parent name
will be extracted from.The logic is to split it on the last dot.
[[parentname].[+ childname]]

Returns:
Returns a pandas dataframe with a new column called parentID
'''
file['parentID'] = [
x.rsplit('.', 1)[0] if '.' in x else x[:-1] for x in file[column_name]
]

return file


address = 'New_Jersey_'
file_1 = import_file(r'C:humans.csv')
file_2= appends_address_before_name(file=file_1)
file_3= create_parent_name(file=file_2 , column_name = 'ID')

print(file_3)


The input CSV is a column of values separated with decimal places like the following:



ID

99.99.9
100.42.3


Example output



parentID

New_Jersey_99.99
New_Jersey_100.42


Furthermore, I feel like the way I am passing variables between the functions at the end of the code seems quite basic and terrible. What can I improve in the above code and how can I improve it ?



A screenshot of the code working



enter image description here










share|improve this question











$endgroup$












  • $begingroup$
    @200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
    $endgroup$
    – Matthew
    Dec 7 '18 at 13:57






  • 1




    $begingroup$
    Thanks for adding the explanation. Downvote retracted.
    $endgroup$
    – 200_success
    Dec 7 '18 at 14:02










  • $begingroup$
    Can you post a sample from your csv, including header?
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:11










  • $begingroup$
    What is the reason behind dropping the last character if there is no '.' in the ID? It seems to result in weird things like 'NewJersey_".
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:25










  • $begingroup$
    @Graipher you are right, the parentID for this one will be just NewJersey, I added an extra underscore in the address , thats wrong.
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:27














5












5








5





$begingroup$


I am quite new to scripting, and here is my first code I wrote. The goal of the code is to extract a parent ID based on a child ID. I would appreciate some constructive criticism and aggressive checkup. The codes works and does what it's supposed to do. But how can I make it more pythonic? more passionate?



import codecs

import numpy as np
import pandas as pd


def import_file(path):
'''imports the .csv files as a pandas dataframe
Args:
path (csv file): Takes in the path for the .csv file

Returns:
Returns a pandas dataframe
'''
with codecs.open(path, "r", encoding='utf-8', errors='ignore') as fdata:
df = pd.read_csv(fdata)
return df


def appends_address_before_name(file):
'''Appends the address before the name ID name
Returns:
Returns the file, with address appened to column name.
'''
file['ID'] = [address + str(col) for col in file['ID']]
return file


def create_parent_name(file, column_name: str):
'''This will create a parent name based on the ID column

Args:
file: takes in the dataFrame created from appends_address_before_name function

column_name: Takes in the column name where the parent name
will be extracted from.The logic is to split it on the last dot.
[[parentname].[+ childname]]

Returns:
Returns a pandas dataframe with a new column called parentID
'''
file['parentID'] = [
x.rsplit('.', 1)[0] if '.' in x else x[:-1] for x in file[column_name]
]

return file


address = 'New_Jersey_'
file_1 = import_file(r'C:humans.csv')
file_2= appends_address_before_name(file=file_1)
file_3= create_parent_name(file=file_2 , column_name = 'ID')

print(file_3)


The input CSV is a column of values separated with decimal places like the following:



ID

99.99.9
100.42.3


Example output



parentID

New_Jersey_99.99
New_Jersey_100.42


Furthermore, I feel like the way I am passing variables between the functions at the end of the code seems quite basic and terrible. What can I improve in the above code and how can I improve it ?



A screenshot of the code working



enter image description here










share|improve this question











$endgroup$




I am quite new to scripting, and here is my first code I wrote. The goal of the code is to extract a parent ID based on a child ID. I would appreciate some constructive criticism and aggressive checkup. The codes works and does what it's supposed to do. But how can I make it more pythonic? more passionate?



import codecs

import numpy as np
import pandas as pd


def import_file(path):
'''imports the .csv files as a pandas dataframe
Args:
path (csv file): Takes in the path for the .csv file

Returns:
Returns a pandas dataframe
'''
with codecs.open(path, "r", encoding='utf-8', errors='ignore') as fdata:
df = pd.read_csv(fdata)
return df


def appends_address_before_name(file):
'''Appends the address before the name ID name
Returns:
Returns the file, with address appened to column name.
'''
file['ID'] = [address + str(col) for col in file['ID']]
return file


def create_parent_name(file, column_name: str):
'''This will create a parent name based on the ID column

Args:
file: takes in the dataFrame created from appends_address_before_name function

column_name: Takes in the column name where the parent name
will be extracted from.The logic is to split it on the last dot.
[[parentname].[+ childname]]

Returns:
Returns a pandas dataframe with a new column called parentID
'''
file['parentID'] = [
x.rsplit('.', 1)[0] if '.' in x else x[:-1] for x in file[column_name]
]

return file


address = 'New_Jersey_'
file_1 = import_file(r'C:humans.csv')
file_2= appends_address_before_name(file=file_1)
file_3= create_parent_name(file=file_2 , column_name = 'ID')

print(file_3)


The input CSV is a column of values separated with decimal places like the following:



ID

99.99.9
100.42.3


Example output



parentID

New_Jersey_99.99
New_Jersey_100.42


Furthermore, I feel like the way I am passing variables between the functions at the end of the code seems quite basic and terrible. What can I improve in the above code and how can I improve it ?



A screenshot of the code working



enter image description here







python beginner pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 7 '18 at 14:23







Matthew

















asked Dec 7 '18 at 13:03









MatthewMatthew

284




284












  • $begingroup$
    @200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
    $endgroup$
    – Matthew
    Dec 7 '18 at 13:57






  • 1




    $begingroup$
    Thanks for adding the explanation. Downvote retracted.
    $endgroup$
    – 200_success
    Dec 7 '18 at 14:02










  • $begingroup$
    Can you post a sample from your csv, including header?
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:11










  • $begingroup$
    What is the reason behind dropping the last character if there is no '.' in the ID? It seems to result in weird things like 'NewJersey_".
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:25










  • $begingroup$
    @Graipher you are right, the parentID for this one will be just NewJersey, I added an extra underscore in the address , thats wrong.
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:27


















  • $begingroup$
    @200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
    $endgroup$
    – Matthew
    Dec 7 '18 at 13:57






  • 1




    $begingroup$
    Thanks for adding the explanation. Downvote retracted.
    $endgroup$
    – 200_success
    Dec 7 '18 at 14:02










  • $begingroup$
    Can you post a sample from your csv, including header?
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:11










  • $begingroup$
    What is the reason behind dropping the last character if there is no '.' in the ID? It seems to result in weird things like 'NewJersey_".
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:25










  • $begingroup$
    @Graipher you are right, the parentID for this one will be just NewJersey, I added an extra underscore in the address , thats wrong.
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:27
















$begingroup$
@200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
$endgroup$
– Matthew
Dec 7 '18 at 13:57




$begingroup$
@200_success I edited the question. On the other hand, I want to improve the logic of the code and how variables are passed between functions. The code works as expected. I was wondering if this looks like something ready for production.
$endgroup$
– Matthew
Dec 7 '18 at 13:57




1




1




$begingroup$
Thanks for adding the explanation. Downvote retracted.
$endgroup$
– 200_success
Dec 7 '18 at 14:02




$begingroup$
Thanks for adding the explanation. Downvote retracted.
$endgroup$
– 200_success
Dec 7 '18 at 14:02












$begingroup$
Can you post a sample from your csv, including header?
$endgroup$
– Graipher
Dec 7 '18 at 14:11




$begingroup$
Can you post a sample from your csv, including header?
$endgroup$
– Graipher
Dec 7 '18 at 14:11












$begingroup$
What is the reason behind dropping the last character if there is no '.' in the ID? It seems to result in weird things like 'NewJersey_".
$endgroup$
– Graipher
Dec 7 '18 at 14:25




$begingroup$
What is the reason behind dropping the last character if there is no '.' in the ID? It seems to result in weird things like 'NewJersey_".
$endgroup$
– Graipher
Dec 7 '18 at 14:25












$begingroup$
@Graipher you are right, the parentID for this one will be just NewJersey, I added an extra underscore in the address , thats wrong.
$endgroup$
– Matthew
Dec 7 '18 at 14:27




$begingroup$
@Graipher you are right, the parentID for this one will be just NewJersey, I added an extra underscore in the address , thats wrong.
$endgroup$
– Matthew
Dec 7 '18 at 14:27










1 Answer
1






active

oldest

votes


















6












$begingroup$

You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1, file_2 and file_3 are all identical.



The usual convention is to return None (implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.



Besides that, pandas is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str. You can find some examples in the documentation.



Your appends_address_before_name function could be simplified a lot because string addition is vectorized:



def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]


And your create_parent_name function could be:



def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")


With a csv file like this:



ID
99.99.9
100.42.3
101


This produces



df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_


Note that the order of the calls has changed, so that ids without a . are handled correctly.





As for general structure:




  • Seeing docstrings is very nice (I omitted them here for brevity)

  • Python has an official style-guide, PEP8. It recommends writing x = some_thing(a=3), so surround equal signs with spaces when assigning but not when setting keyword arguments.

  • You should wrap the main calling code in a if __name__ == "__main__" guard.






share|improve this answer











$endgroup$













  • $begingroup$
    can you kindly elaborate more on object they receive or if the return a modified object. , where in my code am I returning or modifiying objective ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:28










  • $begingroup$
    so you removed the variables, i.e. file_2 as in file_2= appends_address_before_name(file=file_1) , as you modified the original file rather than returning it?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:29








  • 2




    $begingroup$
    @Matthew: When you do file["ID"] = ... you modify the file object in place. Afterwards your return file, which is still the same object, but modified.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:30










  • $begingroup$
    the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:38










  • $begingroup$
    @Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:41











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209208%2fmaking-a-dataframe-of-parent-ids%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









6












$begingroup$

You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1, file_2 and file_3 are all identical.



The usual convention is to return None (implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.



Besides that, pandas is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str. You can find some examples in the documentation.



Your appends_address_before_name function could be simplified a lot because string addition is vectorized:



def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]


And your create_parent_name function could be:



def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")


With a csv file like this:



ID
99.99.9
100.42.3
101


This produces



df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_


Note that the order of the calls has changed, so that ids without a . are handled correctly.





As for general structure:




  • Seeing docstrings is very nice (I omitted them here for brevity)

  • Python has an official style-guide, PEP8. It recommends writing x = some_thing(a=3), so surround equal signs with spaces when assigning but not when setting keyword arguments.

  • You should wrap the main calling code in a if __name__ == "__main__" guard.






share|improve this answer











$endgroup$













  • $begingroup$
    can you kindly elaborate more on object they receive or if the return a modified object. , where in my code am I returning or modifiying objective ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:28










  • $begingroup$
    so you removed the variables, i.e. file_2 as in file_2= appends_address_before_name(file=file_1) , as you modified the original file rather than returning it?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:29








  • 2




    $begingroup$
    @Matthew: When you do file["ID"] = ... you modify the file object in place. Afterwards your return file, which is still the same object, but modified.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:30










  • $begingroup$
    the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:38










  • $begingroup$
    @Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:41
















6












$begingroup$

You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1, file_2 and file_3 are all identical.



The usual convention is to return None (implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.



Besides that, pandas is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str. You can find some examples in the documentation.



Your appends_address_before_name function could be simplified a lot because string addition is vectorized:



def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]


And your create_parent_name function could be:



def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")


With a csv file like this:



ID
99.99.9
100.42.3
101


This produces



df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_


Note that the order of the calls has changed, so that ids without a . are handled correctly.





As for general structure:




  • Seeing docstrings is very nice (I omitted them here for brevity)

  • Python has an official style-guide, PEP8. It recommends writing x = some_thing(a=3), so surround equal signs with spaces when assigning but not when setting keyword arguments.

  • You should wrap the main calling code in a if __name__ == "__main__" guard.






share|improve this answer











$endgroup$













  • $begingroup$
    can you kindly elaborate more on object they receive or if the return a modified object. , where in my code am I returning or modifiying objective ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:28










  • $begingroup$
    so you removed the variables, i.e. file_2 as in file_2= appends_address_before_name(file=file_1) , as you modified the original file rather than returning it?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:29








  • 2




    $begingroup$
    @Matthew: When you do file["ID"] = ... you modify the file object in place. Afterwards your return file, which is still the same object, but modified.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:30










  • $begingroup$
    the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:38










  • $begingroup$
    @Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:41














6












6








6





$begingroup$

You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1, file_2 and file_3 are all identical.



The usual convention is to return None (implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.



Besides that, pandas is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str. You can find some examples in the documentation.



Your appends_address_before_name function could be simplified a lot because string addition is vectorized:



def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]


And your create_parent_name function could be:



def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")


With a csv file like this:



ID
99.99.9
100.42.3
101


This produces



df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_


Note that the order of the calls has changed, so that ids without a . are handled correctly.





As for general structure:




  • Seeing docstrings is very nice (I omitted them here for brevity)

  • Python has an official style-guide, PEP8. It recommends writing x = some_thing(a=3), so surround equal signs with spaces when assigning but not when setting keyword arguments.

  • You should wrap the main calling code in a if __name__ == "__main__" guard.






share|improve this answer











$endgroup$



You should decide if your functions modify the object they receive or if the return a modified object. Doing both is just asking for disaster. After your code has finished, file_1, file_2 and file_3 are all identical.



The usual convention is to return None (implicitly or explicitly) if you mutate any of the inputs. In the rest of this answer I have decided to mutate the inputs.



Besides that, pandas is most effective if you use its vectorized functions. For columns with strings, it has a whole lot of methods which are vectorized. You can access them with df.col_name.str. You can find some examples in the documentation.



Your appends_address_before_name function could be simplified a lot because string addition is vectorized:



def appends_address_before_name(file):
file["parentID"] = address + file["parentID"]
file["ID"] = address + file["ID"]


And your create_parent_name function could be:



def create_parent_name(file, column_name: str):
file["parentID"] = file[column_name].str.split(".").str[:-1].str.join(".")


With a csv file like this:



ID
99.99.9
100.42.3
101


This produces



df = import_file(file_name)
create_parent_name(df, 'ID')
appends_address_before_name(df)
print(df)
# ID parentID
# 0 New_Jersey_99.99.9 New_Jersey_99.99
# 1 New_Jersey_100.42.3 New_Jersey_100.42
# 2 New_Jersey_101 New_Jersey_


Note that the order of the calls has changed, so that ids without a . are handled correctly.





As for general structure:




  • Seeing docstrings is very nice (I omitted them here for brevity)

  • Python has an official style-guide, PEP8. It recommends writing x = some_thing(a=3), so surround equal signs with spaces when assigning but not when setting keyword arguments.

  • You should wrap the main calling code in a if __name__ == "__main__" guard.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 7 '18 at 15:47

























answered Dec 7 '18 at 14:19









GraipherGraipher

24.1k53586




24.1k53586












  • $begingroup$
    can you kindly elaborate more on object they receive or if the return a modified object. , where in my code am I returning or modifiying objective ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:28










  • $begingroup$
    so you removed the variables, i.e. file_2 as in file_2= appends_address_before_name(file=file_1) , as you modified the original file rather than returning it?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:29








  • 2




    $begingroup$
    @Matthew: When you do file["ID"] = ... you modify the file object in place. Afterwards your return file, which is still the same object, but modified.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:30










  • $begingroup$
    the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:38










  • $begingroup$
    @Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:41


















  • $begingroup$
    can you kindly elaborate more on object they receive or if the return a modified object. , where in my code am I returning or modifiying objective ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:28










  • $begingroup$
    so you removed the variables, i.e. file_2 as in file_2= appends_address_before_name(file=file_1) , as you modified the original file rather than returning it?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:29








  • 2




    $begingroup$
    @Matthew: When you do file["ID"] = ... you modify the file object in place. Afterwards your return file, which is still the same object, but modified.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:30










  • $begingroup$
    the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
    $endgroup$
    – Matthew
    Dec 7 '18 at 14:38










  • $begingroup$
    @Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
    $endgroup$
    – Graipher
    Dec 7 '18 at 14:41
















$begingroup$
can you kindly elaborate more on object they receive or if the return a modified object. , where in my code am I returning or modifiying objective ?
$endgroup$
– Matthew
Dec 7 '18 at 14:28




$begingroup$
can you kindly elaborate more on object they receive or if the return a modified object. , where in my code am I returning or modifiying objective ?
$endgroup$
– Matthew
Dec 7 '18 at 14:28












$begingroup$
so you removed the variables, i.e. file_2 as in file_2= appends_address_before_name(file=file_1) , as you modified the original file rather than returning it?
$endgroup$
– Matthew
Dec 7 '18 at 14:29






$begingroup$
so you removed the variables, i.e. file_2 as in file_2= appends_address_before_name(file=file_1) , as you modified the original file rather than returning it?
$endgroup$
– Matthew
Dec 7 '18 at 14:29






2




2




$begingroup$
@Matthew: When you do file["ID"] = ... you modify the file object in place. Afterwards your return file, which is still the same object, but modified.
$endgroup$
– Graipher
Dec 7 '18 at 14:30




$begingroup$
@Matthew: When you do file["ID"] = ... you modify the file object in place. Afterwards your return file, which is still the same object, but modified.
$endgroup$
– Graipher
Dec 7 '18 at 14:30












$begingroup$
the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
$endgroup$
– Matthew
Dec 7 '18 at 14:38




$begingroup$
the concept of returning a file or modifying in place was unknown to me, thanks. What more ? is the docstrings okay ? is the overall structure okay ?
$endgroup$
– Matthew
Dec 7 '18 at 14:38












$begingroup$
@Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
$endgroup$
– Graipher
Dec 7 '18 at 14:41




$begingroup$
@Matthew: I figured out how to get exactly your behaviour and added some comments on general structure.
$endgroup$
– Graipher
Dec 7 '18 at 14:41


















draft saved

draft discarded




















































Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209208%2fmaking-a-dataframe-of-parent-ids%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Probability when a professor distributes a quiz and homework assignment to a class of n students.

Aardman Animations

Are they similar matrix