Skip to content

Conversation

@yvonnefroehlich
Copy link
Member

@yvonnefroehlich yvonnefroehlich commented Dec 24, 2024

This PR adds a JN for GeoPandas - spatial data - GeoDataFrame:

  • point geometry - cities
  • line geometry - rivers
  • polygon geometry - population

Preview:

Africa - issue: disputed borders Chicago - issue: multiple data sources USA - issue: include Alaska & Hawaii
image image hawaii

@gitnotebooks
Copy link

gitnotebooks bot commented Dec 24, 2024

Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/GenericMappingTools/pygmt-paper-figures/pull/6

@seisman seisman changed the title Figure XY: GeoPandas Figure 6: GeoPandas Oct 17, 2025
@seisman seisman changed the title Figure 6: GeoPandas Figure 6: GeoPandas7 Oct 17, 2025
@seisman seisman changed the title Figure 6: GeoPandas7 Figure 7: GeoPandas Oct 17, 2025
@seisman seisman mentioned this pull request Nov 7, 2025
3 tasks
@seisman
Copy link
Member

seisman commented Nov 7, 2025

I'm unsure if we need this example. As I understand it, PyGMT can plot GeoPandas objects, but its functionality for processing them is limited since the aspatial metadata will be lost during processing.

@yvonnefroehlich
Copy link
Member Author

I'm unsure if we need this example. As I understand it, PyGMT can plot GeoPandas objects, but its functionality for processing them is limited since the aspatial metadata will be lost during processing.

I think the idea of having a choropleth map is, to show that there is functionality extending the feature available in GMT.

@seisman
Copy link
Member

seisman commented Nov 19, 2025

This example highlights GeoPandas integration in PyGMT. A geopandas.GeoDataFrame may contain any of the standard geometry types: Point, LineString, MultiLineString, Polygon, and MultiPolygon.

The script below plots three different datasets (available from https://www.naturalearthdata.com/) with different geometrie.

  • world — country boundaries represented by Polygon and MultiPolygon geometries
  • rivers — major rivers represented by LineString geometries
  • cities — populated places represented by Point geometries

So, only MultiLineString is not covered.

import pygmt
import geopandas as gpd

world = gpd.read_file("https://naciscdn.org/naturalearth/50m/cultural/ne_50m_admin_0_countries.zip")
rivers = gpd.read_file("https://naciscdn.org/naturalearth/110m/physical/ne_110m_rivers_lake_centerlines.zip")  
cities = gpd.read_file("https://naciscdn.org/naturalearth/110m/cultural/ne_110m_populated_places_simple.zip")

world["POP_EST"] *= 1.0e-5

fig = pygmt.Figure()
fig.basemap(region="=SA", projection="M15c", frame=True)
fig.plot(data=world, pen="0.5p,black", fill="lightgreen")
pygmt.makecpt(cmap="turbo", series=(0, 3000, 100))
fig.plot(
    data=world,
    pen="0.2p,gray10",
    fill="+z",
    cmap=True,
    aspatial="Z=POP_EST",
)
fig.colorbar(frame=True)
fig.plot(data=rivers, pen="1p,blue")
fig.plot(data=cities, style="c0.1c", fill="red", pen="black")
fig.text(x=cities.geometry.x, y=cities.geometry.y, text=cities["name"], font="10p,Helvetica-Bold,black", offset="0.2c/0.2c")
fig.show()
map

@yvonnefroehlich
Copy link
Member Author

yvonnefroehlich commented Nov 19, 2025

Your example looks quite promising, but I get the following error when running the code:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to
encoding with 'cp1252' codec failed

Will look it in detail tomorrow or within the next days.

@seisman seisman changed the title Figure 7: GeoPandas Figure 6: GeoPandas Nov 20, 2025
@seisman
Copy link
Member

seisman commented Nov 20, 2025

It's likely because the city names contain non-ASCII characters, which doesn't work well in your console.

This is a likely solution for you:

cities["name"] = cities["name"].apply(lambda x: x.encode("utf-8", errors="ignore").decode("utf-8"))

@yvonnefroehlich
Copy link
Member Author

It's likely because the city names contain non-ASCII characters, which doesn't work well in your console.

This is a likely solution for you:

cities["name"] = cities["name"].apply(lambda x: x.encode("utf-8", errors="ignore").decode("utf-8"))

Oh, thanks! I already tried something similar, but it did solve the problem yet. For now, I focus on the geometry column of the dataframe, then the codes work.

@yvonnefroehlich
Copy link
Member Author

Using region="=SA" shows that country codes are supported. But the overlapping labels for the cities between 5°-25° North do not look nice.

region="SA" region=[-89, -33, -56.5, 9.7]
Fig7_PyGMT_geopandas_completregion Fig7_PyGMT_geopandas_smallerregion

@seisman
Copy link
Member

seisman commented Nov 21, 2025

What about Africa? Squares are populated cities, while only "worldcity" (city with global significance) are labelled.

Fig7_PyGMT_geopandas_ne
import pygmt
import geopandas as gpd

world = gpd.read_file("https://naciscdn.org/naturalearth/50m/cultural/ne_50m_admin_0_countries.zip")
rivers = gpd.read_file("https://naciscdn.org/naturalearth/110m/physical/ne_110m_rivers_lake_centerlines.zip")
cities = gpd.read_file("https://naciscdn.org/naturalearth/110m/cultural/ne_110m_populated_places_simple.zip")
world["POP_EST"] *= 1.0e-6

fig = pygmt.Figure()
fig.basemap(region="=AF", projection="Q15c", frame=True)
pygmt.makecpt(cmap="batlow", series=(0, 270, 100)) 
fig.plot(
    data=world[["POP_EST", "geometry"]],
    pen="1p,gray50",
    fill="+z",
    cmap=True,
    aspatial="Z=POP_EST",
)
fig.colorbar(frame="x+lPopulation (millions)")
fig.plot(data=rivers["geometry"], pen="0.5p,skyblue")
fig.plot(data=cities["geometry"], style="s0.3c", fill="red", pen="1p,black")

mask = cities.worldcity == 1
fig.text(
    x=cities[mask].geometry.x,
    y=cities[mask].geometry.y,
    text=cities[mask]["name"],
    font="10p,Helvetica-Bold,black",
    offset="0c/-0.3c",
    justify="TR",
    fill="white@30",
)
fig.show()
fig.savefig(fname="Fig7_PyGMT_geopandas_ne.pdf")

@yvonnefroehlich
Copy link
Member Author

yvonnefroehlich commented Dec 1, 2025

I also didn't find one, but it seems we can calculate area of polygons, which can be used as fills.

Here is an example showing the area of US states.

I modified your code and added rivers and cities. Currently I am using a Lambert projection, and Alaska is excluded. It would be good to not show the rivers outside the USA.

with Alaska, projection="Q15" with Alaska, projection="L-96/35/33/41/12c" without Alaska, projection="L-96/35/33/41/12c"
image image image

yvonnefroehlich and others added 2 commits December 1, 2025 20:02
Co-authored-by: Michael Grund <23025878+michaelgrund@users.noreply.github.com>
@yvonnefroehlich yvonnefroehlich changed the title Figure 6: GeoPandas / spatial data - choropleth map - population with <lines> and <points> of XY Figure 6: GeoPandas / spatial data - choropleth map - <polygons,> with <lines> and <points> of XY Dec 1, 2025
@yvonnefroehlich yvonnefroehlich changed the title Figure 6: GeoPandas / spatial data - choropleth map - <polygons,> with <lines> and <points> of XY Figure 6: GeoPandas / spatial data - choropleth map - <polygons> with <lines> and <points> of XY Dec 1, 2025
@yvonnefroehlich
Copy link
Member Author

Made some updates, without building subsets for the cities and using the 110 m dataset for the rivers. Now, we do not have rivers outside the USA (but only one quite long river).

image

Co-authored-by: Michael Grund <23025878+michaelgrund@users.noreply.github.com>
@yvonnefroehlich yvonnefroehlich changed the title Figure 6: GeoPandas / spatial data - choropleth map - <polygons> with <lines> and <points> of XY Figure 6: GeoPandas / spatial data - choropleth map - area with rivers and cities of the states of the USA Dec 2, 2025
@yvonnefroehlich
Copy link
Member Author

yvonnefroehlich commented Dec 2, 2025

Really like the right Chicago example, including lake etc.

Thanks! Also like this map. But Natural Earth does not provide such detailed data of Chicago which can be used for the choropleth part. If we want to have the same datasource for all datasets, we probably will go with the USA example showing the area of the different states with rivers and cities. Now need to think about Alaska. It looks a bit strange when just plotting it together with the rest of the USA. Sometimes Alaska is placed seperatly as an inset; maybe we can do this? Edit: See review by Dongdong #6 (comment) .

Comment on lines +36 to +47
# Add Alaska and Hawaii separately in the lower left corner
for xshift, region in zip(["0.9c", "2.3c"], [[172, 230, 51, 72], [-168, -154, 18, 29]]):
with fig.shift_origin(xshift=xshift):
fig.plot(
data=states,
region=region,
projection="M2.5c",
cmap=True,
pen="0.2p,gray50",
fill="+z",
aspatial="Z=area_sqkm"
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how we describe this figure in the manuscript, maybe it makes sense to move the code for insets up to the code part for plotting the polygons of the main map (currently line 18).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would change the region and projection settings and we have to set region and projection again when plotting rivers and cities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true. We do not use Figure.inset here.

fig = pygmt.Figure()
fig.basemap(projection="L-96/35/33/41/12c", region=[-126, -66, 25, 49], frame="+n")

pygmt.makecpt(cmap="bilbao", series=[0, states["area_sqkm"].max()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use a different CPT or a different range, to avoid polygons that are too light or too dark?

Copy link
Member Author

@yvonnefroehlich yvonnefroehlich Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not so easy, as Alaska is so much larger and Hawaii so much smaller than the other states.

hawaii nuuk imola bamako
hawaii nuuk imola bamako
acton batlow navia bilbao (currently)
acton batlow navia image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "hawaii" is better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, from the CPTs I tested I think "hawaii" is the best (even I am not really a fan of the colors in this colormap itself).

Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
clearance="0.05c+tO",
)

# Add Alaska and Hawaii separately in the lower left corner
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we set the region for Alaska and Hawaii manually. Ideally, we should be able to get the region from the data directly. The tricky thing is that Hawaii has many small islands, and the region is [-178., -154., 18., 29.]. The code below removes small islands with area < 1.0e8 m^2, and then the region becomes [-161., -154., 18., 23.]. I'm wondering if the code below is better, although we need to explain more details in the main text.

# Add Alaska and Hawaii separately in the lower left corner
for name, xshift in zip(["Alaska", "Hawaii"], ["0.9c", "2.5c"]):
    substates = states[(states["name"] == name)]
    substates = substates.explode()
    substates = substates[substates.to_crs(epsg=6933).area > 1.0e8].dissolve()
    region = pygmt.info(substates, spacing=1)
    with fig.shift_origin(xshift=xshift):
        fig.plot(
            data=substates,
            region=region,
            projection="M2c",
            cmap=True,
            pen="0.2p,gray50",
            fill="+z",
            aspatial="Z=area_sqkm",
        )

rivers = rivers[rivers.intersects(states.union_all())]
cities = cities[cities["adm0name"] == "United States of America"]

states["area_sqkm"] = states.to_crs(epsg=6933).area / 10 ** 9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original dataset has the area_sqkm column with zero values. That's why we compute the areas and store them in this column. But, since the area is in units of 1000 km^2, I feel the column name is more confusing. Maybe we just save it to a new column "area" instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants