Adding new column to existing DataFrame in Python pandas

De openkb
Aller à : Navigation, rechercher

Sommaire

Questions

I have a DataFrame with named columns and rows indexed with not- continuous numbers like from the code:

df1 = DataFrame(np.random.randn(10, 4), columns=[ a ,  b ,  c ,  d ])
mask = df1.applymap(lambda x: x <-0.7)
df1 = df1[-mask.any(axis=1)]
sLength = len(df1[ a ])
e = Series(np.random.randn(sLength))

I would like to add a new column, e , to the existing data frame and do not change anything in the data frame. (The series always got the same length as a dataframe.) I tried different versions of join, append, merge, but I did not get it as what I want, only errors at the most.

The series and data frame is already given and the above code is only to illustrate it with an example.

I am sure there is some easy way to that, but I can t figure it out.

Answers

Use the original df1 indexes to create the series:

df1[ e ] = Series(np.random.randn(sLength), index=df1.index)


  Edit 2015   

Some reported to get the SettingWithCopyWarning with this code.
However, the code still runs perfect with the current pandas version 0.16.1.

>>> sLength = len(df1[ a ])
>>> df1
          a         b         c         d
6 -0.269221 -0.026476  0.997517  1.294385
8  0.917438  0.847941  0.034235 -0.448948

>>> df1[ e ] = p.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e
6 -0.269221 -0.026476  0.997517  1.294385  1.757167
8  0.917438  0.847941  0.034235 -0.448948  2.228131

>>> p.version.short_version
 0.16.1 

The SettingWithCopyWarning aims to inform of a possibly invalid assignment on a copy of the Dataframe. It doesn t necessarily say you did it wrong (it can trigger false positives) but from 0.13.0 it let you know there are more adequate methods for the same purpose. Then, if you get the warning, just follow its advise: Try using .loc[row_index,col_indexer] = value instead

>>> df1.loc[:, f ] = p.Series(np.random.randn(sLength), index=df1.index)
>>> df1
          a         b         c         d         e         f
6 -0.269221 -0.026476  0.997517  1.294385  1.757167 -0.050927
8  0.917438  0.847941  0.034235 -0.448948  2.228131  0.006109
>>> 

http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy



  Edit 2017   

As indicated in the comments and by @Alexander, currently the best method to add the values of a Series as a new column of a DataFrame could be using assign:

df1 = df1.assign(e=p.Series(np.random.randn(sLength)).values)

Source

License : cc by-sa 3.0

http://stackoverflow.com/questions/12555323/adding-new-column-to-existing-dataframe-in-python-pandas

Related

Outils personnels
Espaces de noms

Variantes
Actions
Navigation
Outils