44.. ipython :: python
55 :suppress:
66
7- from pandas import *
8- import numpy.random as random
9- from numpy import *
7+ import pandas as pd
8+ import numpy as np
109 options.display.max_rows= 15
1110
1211 Comparison with R / R libraries
@@ -40,25 +39,25 @@ The :meth:`~pandas.DataFrame.query` method is similar to the base R ``subset``
4039function. In R you might want to get the rows of a ``data.frame `` where one
4140column's values are less than another column's values:
4241
43- .. code-block :: r
42+ .. code-block :: r
4443
45- df <- data.frame(a=rnorm(10), b=rnorm(10))
46- subset(df, a <= b)
47- df[df$a <= df$b,] # note the comma
44+ df <- data.frame(a=rnorm(10), b=rnorm(10))
45+ subset(df, a <= b)
46+ df[df$a <= df$b,] # note the comma
4847
4948 In ``pandas ``, there are a few ways to perform subsetting. You can use
5049:meth: `~pandas.DataFrame.query ` or pass an expression as if it were an
5150index/slice as well as standard boolean indexing:
5251
53- .. ipython :: python
52+ .. ipython :: python
5453
55- from pandas import DataFrame
56- from numpy.random import randn
54+ from pandas import DataFrame
55+ from numpy import random
5756
58- df = DataFrame({' a' : randn(10 ), ' b' : randn(10 )})
59- df.query(' a <= b' )
60- df[df.a <= df.b]
61- df.loc[df.a <= df.b]
57+ df = DataFrame({' a' : random. randn(10 ), ' b' : random. randn(10 )})
58+ df.query(' a <= b' )
59+ df[df.a <= df.b]
60+ df.loc[df.a <= df.b]
6261
6362 For more details and examples see :ref: `the query documentation
6463<indexing.query>`.
@@ -72,20 +71,20 @@ For more details and examples see :ref:`the query documentation
7271An expression using a data.frame called ``df `` in R with the columns ``a `` and
7372``b `` would be evaluated using ``with `` like so:
7473
75- .. code-block :: r
74+ .. code-block :: r
7675
77- df <- data.frame(a=rnorm(10), b=rnorm(10))
78- with(df, a + b)
79- df$a + df$b # same as the previous expression
76+ df <- data.frame(a=rnorm(10), b=rnorm(10))
77+ with(df, a + b)
78+ df$a + df$b # same as the previous expression
8079
8180 In ``pandas `` the equivalent expression, using the
8281:meth: `~pandas.DataFrame.eval ` method, would be:
8382
84- .. ipython :: python
83+ .. ipython :: python
8584
86- df = DataFrame({' a' : randn(10 ), ' b' : randn(10 )})
87- df.eval(' a + b' )
88- df.a + df.b # same as the previous expression
85+ df = DataFrame({' a' : random. randn(10 ), ' b' : random. randn(10 )})
86+ df.eval(' a + b' )
87+ df.a + df.b # same as the previous expression
8988
9089 In certain cases :meth: `~pandas.DataFrame.eval ` will be much faster than
9190evaluation in pure Python. For more details and examples see :ref: `the eval
@@ -123,38 +122,38 @@ summarize ``x`` by ``month``:
123122
124123
125124
126- .. code-block :: r
125+ .. code-block :: r
127126
128- require(plyr)
129- df <- data.frame(
130- x = runif(120, 1, 168),
131- y = runif(120, 7, 334),
132- z = runif(120, 1.7, 20.7),
133- month = rep(c(5,6,7,8),30),
134- week = sample(1:4, 120, TRUE)
135- )
127+ require(plyr)
128+ df <- data.frame(
129+ x = runif(120, 1, 168),
130+ y = runif(120, 7, 334),
131+ z = runif(120, 1.7, 20.7),
132+ month = rep(c(5,6,7,8),30),
133+ week = sample(1:4, 120, TRUE)
134+ )
136135
137- ddply(df, .(month, week), summarize,
138- mean = round(mean(x), 2),
139- sd = round(sd(x), 2))
136+ ddply(df, .(month, week), summarize,
137+ mean = round(mean(x), 2),
138+ sd = round(sd(x), 2))
140139
141140 In ``pandas `` the equivalent expression, using the
142141:meth: `~pandas.DataFrame.groupby ` method, would be:
143142
144143
145144
146- .. ipython :: python
145+ .. ipython :: python
147146
148- df = DataFrame({
149- ' x' : random.uniform(1 ., 168 ., 120 ),
150- ' y' : random.uniform(7 ., 334 ., 120 ),
151- ' z' : random.uniform(1.7 , 20.7 , 120 ),
152- ' month' : [5 ,6 ,7 ,8 ]* 30 ,
153- ' week' : random.randint(1 ,4 , 120 )
154- })
147+ df = DataFrame({
148+ ' x' : random.uniform(1 ., 168 ., 120 ),
149+ ' y' : random.uniform(7 ., 334 ., 120 ),
150+ ' z' : random.uniform(1.7 , 20.7 , 120 ),
151+ ' month' : [5 ,6 ,7 ,8 ]* 30 ,
152+ ' week' : random.randint(1 ,4 , 120 )
153+ })
155154
156- grouped = df.groupby([' month' ,' week' ])
157- print grouped[' x' ].agg([mean, std])
155+ grouped = df.groupby([' month' ,' week' ])
156+ print grouped[' x' ].agg([np. mean, np. std])
158157
159158
160159 For more details and examples see :ref: `the groupby documentation
@@ -169,35 +168,36 @@ reshape / reshape2
169168An expression using a 3 dimensional array called ``a `` in R where you want to
170169melt it into a data.frame:
171170
172- .. code-block :: r
171+ .. code-block :: r
173172
174- a <- array(c(1:23, NA), c(2,3,4))
175- data.frame(melt(a))
173+ a <- array(c(1:23, NA), c(2,3,4))
174+ data.frame(melt(a))
176175
177176 In Python, since ``a `` is a list, you can simply use list comprehension.
178177
179- .. ipython :: python
180- a = array(range (1 ,24 )+ [NAN ]).reshape(2 ,3 ,4 )
181- DataFrame([tuple (list (x)+ [val]) for x, val in ndenumerate(a)])
178+ .. ipython :: python
179+
180+ a = np.array(range (1 ,24 )+ [np.NAN ]).reshape(2 ,3 ,4 )
181+ DataFrame([tuple (list (x)+ [val]) for x, val in np.ndenumerate(a)])
182182
183183 |meltlist |_
184184~~~~~~~~~~~~
185185
186186An expression using a list called ``a `` in R where you want to melt it
187187into a data.frame:
188188
189- .. code-block :: r
189+ .. code-block :: r
190190
191- a <- as.list(c(1:4, NA))
192- data.frame(melt(a))
191+ a <- as.list(c(1:4, NA))
192+ data.frame(melt(a))
193193
194194 In Python, this list would be a list of tuples, so
195195:meth: `~pandas.DataFrame ` method would convert it to a dataframe as required.
196196
197- .. ipython :: python
197+ .. ipython :: python
198198
199- a = list (enumerate (range (1 ,5 )+ [NAN ]))
200- DataFrame(a)
199+ a = list (enumerate (range (1 ,5 )+ [np. NAN ]))
200+ DataFrame(a)
201201
202202 For more details and examples see :ref: `the Into to Data Structures
203203documentation <basics.dataframe.from_items>`.
@@ -208,26 +208,26 @@ documentation <basics.dataframe.from_items>`.
208208An expression using a data.frame called ``cheese `` in R where you want to
209209reshape the data.frame:
210210
211- .. code-block :: r
211+ .. code-block :: r
212212
213- cheese <- data.frame(
214- first = c('John, Mary'),
215- last = c('Doe', 'Bo'),
216- height = c(5.5, 6.0),
217- weight = c(130, 150)
218- )
219- melt(cheese, id=c("first", "last"))
213+ cheese <- data.frame(
214+ first = c('John, Mary'),
215+ last = c('Doe', 'Bo'),
216+ height = c(5.5, 6.0),
217+ weight = c(130, 150)
218+ )
219+ melt(cheese, id=c("first", "last"))
220220
221221 In Python, the :meth: `~pandas.melt ` method is the R equivalent:
222222
223- .. ipython :: python
223+ .. ipython :: python
224224
225- cheese = DataFrame({' first' : [' John' , ' Mary' ],
226- ' last' : [' Doe' , ' Bo' ],
227- ' height' : [5.5 , 6.0 ],
228- ' weight' : [130 , 150 ]})
229- melt(cheese, id_vars = [' first' , ' last' ])
230- cheese.set_index([' first' , ' last' ]).stack() # alternative way
225+ cheese = DataFrame({' first' : [' John' , ' Mary' ],
226+ ' last' : [' Doe' , ' Bo' ],
227+ ' height' : [5.5 , 6.0 ],
228+ ' weight' : [130 , 150 ]})
229+ pd. melt(cheese, id_vars = [' first' , ' last' ])
230+ cheese.set_index([' first' , ' last' ]).stack() # alternative way
231231
232232 For more details and examples see :ref: `the reshaping documentation
233233<reshaping.melt>`.
@@ -238,33 +238,33 @@ For more details and examples see :ref:`the reshaping documentation
238238An expression using a data.frame called ``df `` in R to cast into a higher
239239dimensional array:
240240
241- .. code-block :: r
241+ .. code-block :: r
242242
243- df <- data.frame(
244- x = runif(12, 1, 168),
245- y = runif(12, 7, 334),
246- z = runif(12, 1.7, 20.7),
247- month = rep(c(5,6,7),4),
248- week = rep(c(1,2), 6)
249- )
243+ df <- data.frame(
244+ x = runif(12, 1, 168),
245+ y = runif(12, 7, 334),
246+ z = runif(12, 1.7, 20.7),
247+ month = rep(c(5,6,7),4),
248+ week = rep(c(1,2), 6)
249+ )
250250
251- mdf <- melt(df, id=c("month", "week"))
252- acast(mdf, week ~ month ~ variable, mean)
251+ mdf <- melt(df, id=c("month", "week"))
252+ acast(mdf, week ~ month ~ variable, mean)
253253
254254 In Python the best way is to make use of :meth: `~pandas.pivot_table `:
255255
256- .. ipython :: python
257-
258- df = DataFrame({
259- ' x' : random.uniform(1 ., 168 ., 12 ),
260- ' y' : random.uniform(7 ., 334 ., 12 ),
261- ' z' : random.uniform(1.7 , 20.7 , 12 ),
262- ' month' : [5 ,6 ,7 ]* 4 ,
263- ' week' : [1 ,2 ]* 6
264- })
265- mdf = melt(df, id_vars = [' month' , ' week' ])
266- pivot_table(mdf, values = ' value' , rows = [' variable' ,' week' ],
267- cols = [' month' ], aggfunc = mean)
256+ .. ipython :: python
257+
258+ df = DataFrame({
259+ ' x' : random.uniform(1 ., 168 ., 12 ),
260+ ' y' : random.uniform(7 ., 334 ., 12 ),
261+ ' z' : random.uniform(1.7 , 20.7 , 12 ),
262+ ' month' : [5 ,6 ,7 ]* 4 ,
263+ ' week' : [1 ,2 ]* 6
264+ })
265+ mdf = pd. melt(df, id_vars = [' month' , ' week' ])
266+ pd. pivot_table(mdf, values = ' value' , rows = [' variable' ,' week' ],
267+ cols = [' month' ], aggfunc = np. mean)
268268
269269 For more details and examples see :ref: `the reshaping documentation
270270<reshaping.pivot>`.
0 commit comments