Skip to content

Commit ac5d1d3

Browse files
added courseProjectCode and courseProjectDocs folder
1 parent 5b16660 commit ac5d1d3

File tree

2 files changed

+32
-0
lines changed

2 files changed

+32
-0
lines changed

courseProjectCode/project-proposal.md

Whitespace-only changes.
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Requirements and Test Oracles
2+
3+
## Functional Requirements
4+
5+
- **FR-1**: The system shall handle missing data by representing missing values as NaN, NA or NaT in both floating-point and non-floating-point data.
6+
- **FR-2**: The system shall support size mutability of tabular structures, allowing columns to be inserted or deleted from a DataFrame or higher-dimensional object.
7+
- **FR-3**: The system shall automatically and explicitly align data when performing operations on objects, ensuring labels are aligned or allowing the user to ignore labels for automatic alignment.
8+
- **FR-4**: The system shall provide flexible group-by functionality to perform split-apply-combine operations for aggregating or transforming data.
9+
- **FR-5**: The system shall provide robust I/O tools for loading data from flat files (CSV and delimited), Excel files and databases and for saving/loading data using the ultrafast HDF5 format.
10+
- **FR-6**: The system shall provide time-series-specific functionality such as date-range generation, frequency conversion, moving-window statistics, and date shifting/lagging.
11+
12+
## Non-Functional Requirements
13+
14+
- **NFR-1**: The system shall provide fast, flexible and expressive data structures designed to make working with relational or labeled data easy and intuitive.
15+
- **NFR-2**: The system shall be powerful and flexible, aiming to be the most powerful open-source data analysis/manipulation tool available.
16+
- **NFR-3**: The system shall provide robust I/O capabilities that load and save data efficiently, including the ultrafast HDF5 format.
17+
18+
---
19+
20+
## Test Oracles
21+
22+
| Requirement ID | Requirement Description | Test Oracle (Expected Behavior) |
23+
|----------------|--------------------------|----------------------------------|
24+
| **FR-1** | Handle missing data with NaN/NA/NaT representations | When a DataFrame column contains a missing value, the system should represent it as NaN (or NA/NaT for date types) and subsequent computations should treat the value as missing. |
25+
| **FR-2** | Support size mutability – columns can be inserted/deleted | After inserting a new column into a DataFrame, the number of columns increases and the new column is accessible by label; after deleting it, the column should no longer exist and the shape of the DataFrame reflects the removal. |
26+
| **FR-3** | Automatic and explicit data alignment across objects | When adding two Series objects with misaligned indexes, the system should align on index labels and introduce missing values where labels do not match. |
27+
| **FR-4** | Provide flexible group-by functionality | When grouping a DataFrame by a categorical column and applying a sum aggregation, the resulting object should contain aggregated sums for each group that equal the sum of values in the original DataFrame for that group. |
28+
| **FR-5** | Robust I/O tools for loading and saving data | Reading a CSV file containing 100 rows and 5 columns should create a DataFrame with 100 rows and 5 columns and values that match the file; saving to HDF5 and then reloading should yield an identical DataFrame. |
29+
| **FR-6** | Time-series-specific functionality | Generating a date range between “2023-01-01” and “2023-01-10” with a daily frequency should produce a sequence of 10 dates; shifting the resulting series by one period should move each date forward by one day. |
30+
| **NFR-1** | Provide fast, flexible and expressive data structures | Creating and slicing a DataFrame with 10,000 rows should complete within an acceptable threshold (e.g., under 50 ms) in standard hardware, reflecting expected performance. |
31+
| **NFR-2** | Be a powerful and flexible open-source data analysis tool | The API should allow users to chain multiple operations (e.g., filtering, grouping and aggregation) in a single fluent expression; the resulting code should remain readable and the operations should execute correctly. |
32+
| **NFR-3** | Provide robust I/O capabilities | Loading a large CSV file (e.g., 1 GB) and saving it to HDF5 should not crash and should complete without data corruption; memory usage should remain within reasonable bounds relative to the file size. |

0 commit comments

Comments
 (0)