44.. ipython :: python
55 :suppress:
66
7- from pandas import *
7+ import pandas as pd
8+ import numpy as np
89 options.display.max_rows= 15
910
1011 Comparison with R / R libraries
@@ -38,25 +39,25 @@ The :meth:`~pandas.DataFrame.query` method is similar to the base R ``subset``
3839function. In R you might want to get the rows of a ``data.frame `` where one
3940column's values are less than another column's values:
4041
41- .. code-block :: r
42+ .. code-block :: r
4243
43- df <- data.frame(a=rnorm(10), b=rnorm(10))
44- subset(df, a <= b)
45- df[df$a <= df$b,] # note the comma
44+ df <- data.frame(a=rnorm(10), b=rnorm(10))
45+ subset(df, a <= b)
46+ df[df$a <= df$b,] # note the comma
4647
4748 In ``pandas ``, there are a few ways to perform subsetting. You can use
4849:meth: `~pandas.DataFrame.query ` or pass an expression as if it were an
4950index/slice as well as standard boolean indexing:
5051
51- .. ipython :: python
52+ .. ipython :: python
5253
53- from pandas import DataFrame
54- from numpy.random import randn
54+ from pandas import DataFrame
55+ from numpy import random
5556
56- df = DataFrame({' a' : randn(10 ), ' b' : randn(10 )})
57- df.query(' a <= b' )
58- df[df.a <= df.b]
59- df.loc[df.a <= df.b]
57+ df = DataFrame({' a' : random. randn(10 ), ' b' : random. randn(10 )})
58+ df.query(' a <= b' )
59+ df[df.a <= df.b]
60+ df.loc[df.a <= df.b]
6061
6162 For more details and examples see :ref: `the query documentation
6263<indexing.query>`.
@@ -70,20 +71,20 @@ For more details and examples see :ref:`the query documentation
7071An expression using a data.frame called ``df `` in R with the columns ``a `` and
7172``b `` would be evaluated using ``with `` like so:
7273
73- .. code-block :: r
74+ .. code-block :: r
7475
75- df <- data.frame(a=rnorm(10), b=rnorm(10))
76- with(df, a + b)
77- df$a + df$b # same as the previous expression
76+ df <- data.frame(a=rnorm(10), b=rnorm(10))
77+ with(df, a + b)
78+ df$a + df$b # same as the previous expression
7879
7980 In ``pandas `` the equivalent expression, using the
8081:meth: `~pandas.DataFrame.eval ` method, would be:
8182
82- .. ipython :: python
83+ .. ipython :: python
8384
84- df = DataFrame({' a' : randn(10 ), ' b' : randn(10 )})
85- df.eval(' a + b' )
86- df.a + df.b # same as the previous expression
85+ df = DataFrame({' a' : random. randn(10 ), ' b' : random. randn(10 )})
86+ df.eval(' a + b' )
87+ df.a + df.b # same as the previous expression
8788
8889 In certain cases :meth: `~pandas.DataFrame.eval ` will be much faster than
8990evaluation in pure Python. For more details and examples see :ref: `the eval
9899plyr
99100----
100101
102+ ``plyr `` is an R library for the split-apply-combine strategy for data
103+ analysis. The functions revolve around three data structures in R, ``a ``
104+ for ``arrays ``, ``l `` for ``lists ``, and ``d `` for ``data.frame ``. The
105+ table below shows how these data structures could be mapped in Python.
106+
107+ +------------+-------------------------------+
108+ | R | Python |
109+ +============+===============================+
110+ | array | list |
111+ +------------+-------------------------------+
112+ | lists | dictionary or list of objects |
113+ +------------+-------------------------------+
114+ | data.frame | dataframe |
115+ +------------+-------------------------------+
116+
117+ |ddply |_
118+ ~~~~~~~~
119+
120+ An expression using a data.frame called ``df `` in R where you want to
121+ summarize ``x `` by ``month ``:
122+
123+
124+
125+ .. code-block :: r
126+
127+ require(plyr)
128+ df <- data.frame(
129+ x = runif(120, 1, 168),
130+ y = runif(120, 7, 334),
131+ z = runif(120, 1.7, 20.7),
132+ month = rep(c(5,6,7,8),30),
133+ week = sample(1:4, 120, TRUE)
134+ )
135+
136+ ddply(df, .(month, week), summarize,
137+ mean = round(mean(x), 2),
138+ sd = round(sd(x), 2))
139+
140+ In ``pandas `` the equivalent expression, using the
141+ :meth: `~pandas.DataFrame.groupby ` method, would be:
142+
143+
144+
145+ .. ipython :: python
146+
147+ df = DataFrame({
148+ ' x' : random.uniform(1 ., 168 ., 120 ),
149+ ' y' : random.uniform(7 ., 334 ., 120 ),
150+ ' z' : random.uniform(1.7 , 20.7 , 120 ),
151+ ' month' : [5 ,6 ,7 ,8 ]* 30 ,
152+ ' week' : random.randint(1 ,4 , 120 )
153+ })
154+
155+ grouped = df.groupby([' month' ,' week' ])
156+ print grouped[' x' ].agg([np.mean, np.std])
157+
158+
159+ For more details and examples see :ref: `the groupby documentation
160+ <groupby.aggregate>`.
161+
101162reshape / reshape2
102163------------------
103164
165+ |meltarray |_
166+ ~~~~~~~~~~~~~
167+
168+ An expression using a 3 dimensional array called ``a `` in R where you want to
169+ melt it into a data.frame:
170+
171+ .. code-block :: r
172+
173+ a <- array(c(1:23, NA), c(2,3,4))
174+ data.frame(melt(a))
175+
176+ In Python, since ``a `` is a list, you can simply use list comprehension.
177+
178+ .. ipython :: python
179+
180+ a = np.array(range (1 ,24 )+ [np.NAN ]).reshape(2 ,3 ,4 )
181+ DataFrame([tuple (list (x)+ [val]) for x, val in np.ndenumerate(a)])
182+
183+ |meltlist |_
184+ ~~~~~~~~~~~~
185+
186+ An expression using a list called ``a `` in R where you want to melt it
187+ into a data.frame:
188+
189+ .. code-block :: r
190+
191+ a <- as.list(c(1:4, NA))
192+ data.frame(melt(a))
193+
194+ In Python, this list would be a list of tuples, so
195+ :meth: `~pandas.DataFrame ` method would convert it to a dataframe as required.
196+
197+ .. ipython :: python
198+
199+ a = list (enumerate (range (1 ,5 )+ [np.NAN ]))
200+ DataFrame(a)
201+
202+ For more details and examples see :ref: `the Into to Data Structures
203+ documentation <basics.dataframe.from_items>`.
204+
205+ |meltdf |_
206+ ~~~~~~~~~~~~~~~~
207+
208+ An expression using a data.frame called ``cheese `` in R where you want to
209+ reshape the data.frame:
210+
211+ .. code-block :: r
212+
213+ cheese <- data.frame(
214+ first = c('John', 'Mary'),
215+ last = c('Doe', 'Bo'),
216+ height = c(5.5, 6.0),
217+ weight = c(130, 150)
218+ )
219+ melt(cheese, id=c("first", "last"))
220+
221+ In Python, the :meth: `~pandas.melt ` method is the R equivalent:
222+
223+ .. ipython :: python
224+
225+ cheese = DataFrame({' first' : [' John' , ' Mary' ],
226+ ' last' : [' Doe' , ' Bo' ],
227+ ' height' : [5.5 , 6.0 ],
228+ ' weight' : [130 , 150 ]})
229+ pd.melt(cheese, id_vars = [' first' , ' last' ])
230+ cheese.set_index([' first' , ' last' ]).stack() # alternative way
231+
232+ For more details and examples see :ref: `the reshaping documentation
233+ <reshaping.melt>`.
234+
235+ |cast |_
236+ ~~~~~~~
237+
238+ An expression using a data.frame called ``df `` in R to cast into a higher
239+ dimensional array:
240+
241+ .. code-block :: r
242+
243+ df <- data.frame(
244+ x = runif(12, 1, 168),
245+ y = runif(12, 7, 334),
246+ z = runif(12, 1.7, 20.7),
247+ month = rep(c(5,6,7),4),
248+ week = rep(c(1,2), 6)
249+ )
250+
251+ mdf <- melt(df, id=c("month", "week"))
252+ acast(mdf, week ~ month ~ variable, mean)
253+
254+ In Python the best way is to make use of :meth: `~pandas.pivot_table `:
255+
256+ .. ipython :: python
257+
258+ df = DataFrame({
259+ ' x' : random.uniform(1 ., 168 ., 12 ),
260+ ' y' : random.uniform(7 ., 334 ., 12 ),
261+ ' z' : random.uniform(1.7 , 20.7 , 12 ),
262+ ' month' : [5 ,6 ,7 ]* 4 ,
263+ ' week' : [1 ,2 ]* 6
264+ })
265+ mdf = pd.melt(df, id_vars = [' month' , ' week' ])
266+ pd.pivot_table(mdf, values = ' value' , rows = [' variable' ,' week' ],
267+ cols = [' month' ], aggfunc = np.mean)
268+
269+ For more details and examples see :ref: `the reshaping documentation
270+ <reshaping.pivot>`.
104271
105272.. |with | replace :: ``with ``
106273.. _with : http://finzi.psych.upenn.edu/R/library/base/html/with.html
107274
108275.. |subset | replace :: ``subset ``
109276.. _subset : http://finzi.psych.upenn.edu/R/library/base/html/subset.html
277+
278+ .. |ddply | replace :: ``ddply ``
279+ .. _ddply : http://www.inside-r.org/packages/cran/plyr/docs/ddply
280+
281+ .. |meltarray | replace :: ``melt.array ``
282+ .. _meltarray : http://www.inside-r.org/packages/cran/reshape2/docs/melt.array
283+
284+ .. |meltlist | replace :: ``melt.list ``
285+ .. meltlist: http://www.inside-r.org/packages/cran/reshape2/docs/melt.list
286+
287+ .. |meltdf | replace :: ``melt.data.frame ``
288+ .. meltdf: http://www.inside-r.org/packages/cran/reshape2/docs/melt.data.frame
289+
290+ .. |cast | replace :: ``cast ``
291+ .. cast: http://www.inside-r.org/packages/cran/reshape2/docs/cast
292+
0 commit comments