Adding new column to existing DataFrame in Python pandas

Sommaire

1 Questions
2 Answers
3 Source
4 Related

Questions

I have a DataFrame with named columns and rows indexed with not- continuous numbers like from the code:

df1 = DataFrame(np.random.randn(10, 4), columns=[ a ,  b ,  c ,  d ])
mask = df1.applymap(lambda x: x <-0.7)
df1 = df1[-mask.any(axis=1)]
sLength = len(df1[ a ])
e = Series(np.random.randn(sLength))

I would like to add a new column, e , to the existing data frame and do not change anything in the data frame. (The series always got the same length as a dataframe.) I tried different versions of join, append, merge, but I did not get it as what I want, only errors at the most.

The series and data frame is already given and the above code is only to illustrate it with an example.

I am sure there is some easy way to that, but I can t figure it out.

Answers

Use the original df1 indexes to create the series:

df1[ e ] = Series(np.random.randn(sLength), index=df1.index)

  Edit 2015

Some reported to get the SettingWithCopyWarning with this code.
However, the code still runs perfect with the current pandas version 0.16.1.

>>> sLength = len(df1[ a ])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1[ e ] = p.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> p.version.short_version
 0.16.1

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn t necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

>>> df1.loc[:, f ] = p.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>>

http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

  Edit 2017

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

df1 = df1.assign(e=p.Series(np.random.randn(sLength)).values)

Source

License : cc by-sa 3.0

http://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas

Adding new column to existing DataFrame in Python pandas

Sommaire

Questions

Answers

Source

Related

Outils personnels

Espaces de noms

Variantes

Affichages

Actions

Rechercher

Navigation

Outils