-
Notifications
You must be signed in to change notification settings - Fork 1
Figure 7: NumPy / datetime objects - date time chart - GitHub stars of GMT and the wrappers #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Found 1 changed notebook. Review the changes at https://app.gitnotebooks.com/GenericMappingTools/pygmt-paper-figures/pull/10 |
|
My suggestions:
>>> import pandas as pd
>>> df = pd.read_csv("/home/seisman/Downloads/star-history-20251017.csv")
>>> df["Date"] = df["Date"].str.split(" \\(").str[0]
>>> df["Date"] = pd.to_datetime(df["Date"], format="%a %b %d %Y %H:%M:%S GMT%z")The main purpose is to show that PyGMT can integrate well with existing Python workflows (load data from a file, process data, then visualize) and also can handle datetime correctly.
fig = pygmt.Figure()
for csvfile, color, label in zip(
["star-history-gmt.csv", "star-history-pygmt.csv", "star-history-gmtjl.csv"],
["238/86/52", "63/124/173", "170/121/193"],
["GMT", "PyGMT", "GMT.jl"],
):
df = pd.read_csv(csvfile)
df["Date"] = df["Date"].str.split(" \\(").str[0]
df["Date"] = pd.to_datetime(df["Date"], format="%a %b %d %Y %H:%M:%S GMT%z")
fig.plot(x=df["Date"], y=df["Stars"], pen=color, no_clip=True)
fig.plot(x=df["Date"], y=df["Stars"], fill=color, style="a0.35c", no_clip=True, label=label)
fig.legend(position="jTL")
fig.show() |
…p | Remove codes for temporal aligement
|
Thanks for your detailed recommondations!
Yes, I agree reading the data directly from the CSV files is better. I remember that reading the files did not work directly for me and that the codes in this PR were still the codes I used for the figure for the AGU talk 🙈. Following your suggestions the example is updated in commit 9d3fb0d. |
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
Co-authored-by: Dongdong Tian <seisman.info@gmail.com>
|
The Jupyter Notebook appears to be broken, likely because one of my suggestions didn’t conform to the notebook’s JSON format. In other words, we should avoid making changes to notebooks directly through the GitHub web interface. Edit: I just saw that you have fixed it. Thanks. |
Actually, I am wondering if it would make the discussion and review more convenient to first start with normal Python scripts and, after we agreed on (nearly) final versions of the figures and codes, copy things into a JN (and maybe split it over multiple cells)? |
Yes, I also free a normal Python script is more convenient. |
Added normal Python scripts for all figures (PRs). |
Fig7_PyGMT_datetime.py
Outdated
| ) | ||
|
|
||
| fig.plot(x=df["Date"], y=df["Stars"], pen=color) | ||
| fig.plot(x=df["Date"], y=df["Stars"], fill=color, style="a0.35c", label=wrapper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should use four different symbols, although the star symbol is appropriate for a plot showing GitHub star history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I think I like to keep the stars. What does the others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefits of using different symbols are twofold: first, they demonstrate that PyGMT supports a wide variety of marker styles, and that auto-legend works well with them. Second, symbols with distinct markers are more effective for gray-scale printing and are more accessible for colorblind readers. For me, it's a little difficult to distinguish Julia's purple from Python's blue.
BTW, where did you get the colors?
- The GMT red
"238/86/52"is from thegmtlogo.cin the GMT repo - The Matlab one?
- The Julia purple
"170/121/193"should be"149/88/178"(https://github.com/JuliaLang/julia-logo-graphics). - The Python blue
"63/124/173"should be"48/105/152"(https://www.brandcolorcode.com/python).
With the Python yellow "255/212/59", the figure looks good to me, even with the same symbols:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @seisman. Especially for gray-scale printing (some people still do that 😆) and colorblind readers different symbols make it much easier to distinguish between the different entries.
- I suggest to avoid yellow color on a white background.
- Would change the legend entry order so that it follows the order of the last shown data point of each entry. Makes it much easier to read the figure.:
- GMT
- PyGMT
- GMT.jl
- GMT/MEX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distinct markers are more effective for gray-scale printing and are more accessible for colorblind readers.
This is a fair argument. Changed it in commit c144d8c.
gray-scale printing (some people still do that 😆)
Yep, students xD (and then color it by hand).
BTW, where did you get the colors?
Good question. I think these are the RGB codes I used for the example for the AGU presentation one year ago. But though this are the offical RGB codes, but unfortuantely I do not remember what went wrong here 🙁. Thanks for spotting! For the MATLAB logo, I extracted the RGB code from the logo, as I did not directly finde a source for the offical RGB code. Edit: Also fixed in the codes for the AGU presentation.
I suggest to avoid yellow color on a white background.
Agree, yellow on white background is not optimal and it depends on the screen how good the contrast / visibility is. Thus would keep the Python's blue
Would change the legend entry order so that it follows the order of the last shown data point of each entry. Makes it much easier to read the figure.
Hm. The order is currently based on the start of the project; maybe this is helpful for descripting the figure in the text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I got this. If people prefer this order I am OK with reordering this (see commit 0018c93). Probably, only a few people will recognize the order based on the start of the projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to get rid of the whole legend and add the repo labels at the end of each corresponding line. Just depends on if we want to show legend creation or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example shows the auto-legend feature, so I feel we should keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would keep the auto-legend feature. I think it's important to show that PyGMT can do this. For simple automazied plots updating the legend manually can get a bit inconvient. GMT-specific legend files or passing this synthax via an io.StringIO object would need some more explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, where did you get the colors?
Good question. I think these are the RGB codes I used for the example for the AGU presentation one year ago. But though this are the offical RGB codes, but unfortuantely I do not remember what went wrong here 🙁. Thanks for spotting! For the MATLAB logo, I extracted the RGB code from the logo, as I did not directly finde a source for the offical RGB code. Edit: Also fixed in the codes for the AGU presentation.
Regarding the MATLAB color: I still did not find an offical RGB code for the orange in the MATLAB logo. On this website (https://de.mathworks.com/help/matlab/visualize/creating-the-matlab-logo.html) MATLAB shows how users can create the logo on their own. There they use the color s.FaceColor = [0.9 0.2 0.2], but the afterwards applied lighting effects change the apperance of the color.
Fig7_PyGMT_datetime.py
Outdated
| ["238/86/52", "253/131/68", "170/121/193", "63/124/173"], | ||
| strict=False, | ||
| ): | ||
| df = pd.read_csv(f"star_history_{file}.csv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also made a comment regarding the data source in the manuscript: Afaik it's not possible to directly load the data via URL? Would make the workflow much more professional than loading (outdated) individual csv's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not possible to directly load the data via URL?
I think the answer is no; however, Fig. 4 already demonstrates how to load a dataset via a URL, so reading CSV files in this example is not that bad.
In fact, I'm not happy with the star-history data, as the stars are unevenly sampled.
GitHub provides an API to retrieve star history. The Python script below returns the timestamp when each user starred the PyGMT project. The code is about 30 lines long, so it may be too long to include in the manuscript. Nevertheless, we could use the script to obtain the full history of project stars and then resample the data at every three-month intervals.
import requests
import pandas as pd
owner, repo = "GenericMappingTools", "pygmt"
headers = {"Accept": "application/vnd.github.v3.star+json"}
timestamps = []
page = 1
while True:
r = requests.get(
f"https://api.github.com/repos/{owner}/{repo}/stargazers",
headers=headers,
params={"per_page": 100, "page": page},
)
data = r.json()
if not data:
break
timestamps += [s["starred_at"] for s in data] # full ISO 8601 timestamp
page += 1
timestamps.sort() # ISO strings sort chronologically
df = pd.DataFrame({"timestamp": timestamps})
df["cumulative_stars"] = range(1, len(df) + 1)
print(df)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice, but for me it still looks like, it is only possilbe to get a link which creates an live-updated time chart for the project GitHub README. And the CSV files one can download contain the date of downloading in the file name. Also the temporal sampling is a bit interesting, as it is not equally spaced and different for different repos; would prefer to have the data points for all repos at the same dates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevertheless, we could use the script to obtain the full history of project stars and then resample the data at every three-month intervals.
I guess @seisman means to group the data into 3-months buckets and just show for each project quarterly numbers. This would allow to have the same time-spacing for all repos.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like @seisman and I wrote our comments nearly at the same time 🙃, and we are both pointing out that the star-history data does not have an equal time spacing. And I am actually having the impression that the data points that are reported in the CSV file change over time.
Tried the request script from Dongdong. I do not think we have to show this directly in the manuscript. We can include it as a separate cell in the JN, but for the example itself, it should be OK to load the saved CSV files and focus on the plotting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about which columns to store in the CSV file and what to include in the manuscript:
DateandStarsat three-month intervals. It can be plotted directly, so the example code can focus on PyGMT plotting.DateandStarsfor each GitHub user.Starsgo from 1 to the max. In the example code, we'd need to resample it to three-month intervals before plotting.UserNameandDatefor each GitHub user. This is basically the raw data. We need to do data processing to get the cumulative stars before plotting. This also makes the code follow a clearer read → process → visualize workflow.

This PR adds a JN for
NumPy- array like data - arrays of integers / floats and datatime objects:Figure.plotThe GitHub stars of of GMT (the red from the GMT logo), GMT / MEX (the orange for the MATLAB logo), GMT.jl (the purple from the Julia logo], and PyGMT (the blue from the Python logo) are used as data. Data as CSV files are included for completeness / reference.
Preview:
