Skip to content

Commit dad0964

Browse files
yang-chenggCopilot
andauthored
add feature registry app code (#548)
add v1 feature registry app code --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 274a429 commit dad0964

35 files changed

+1829
-6
lines changed

feature-registry-app/.gitignore

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Databricks
2+
.databricks/
3+
.databricks.sync-snapshots
4+
5+
# Python
6+
__pycache__/
7+
.pytest_cache/
8+
.coverage
9+
10+
# Environment & Config
11+
deploy_config.sh

feature-registry-app/README.md

Lines changed: 59 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,14 @@ date: 2025-08-05
77

88
# 🚀 Feature Registry Application
99

10-
This application provides a modern interface for discovering and managing features with seamless integration to Unity Catalog.
10+
This is a modern web application that allows users to interact with the Databricks Feature Registry. The app provides a user-friendly interface for exploring existing features in Unity Catalog. Additionally, users can generate code for creating feature specs and training sets to train machine learning models and deploy features as Feature Serving Endpoints.
1111

1212
## ✨ Features
1313

14-
- 🔍 List and search for features
14+
- 🔍 List and search for features in Unity Catalog
1515
- 🔒 On-behalf-of-user authentication
1616
- ⚙️ Code-gen for creating feature specs and training sets
17+
- 📋 Configurable catalog allow-listing for access control
1718

1819
## 🏗️ Architecture
1920

@@ -25,11 +26,63 @@ The application is built with:
2526

2627
![Feature Registry Interface](./images/feature-registry-interface.png)
2728

29+
## 🚀 Deployment
30+
31+
### Create an App
32+
1. Log into your destination Databricks workspace and navigate to "Compute > Apps"
33+
2. Click on "Create App" and select "Create a custom app"
34+
3. Enter an app name and click "Create app"
35+
36+
### Customization
37+
1. Create a file named `deploy_config.sh` in the root folder with the following variables:
38+
```sh
39+
# Path to a destination folder in default Databricks workspace where source code will be sync'ed
40+
export DEST=/Workspace/Users/Path/To/App/Code
41+
# Name of the App to deploy
42+
export APP_NAME=your-app-name
43+
```
44+
Or simply run `./deploy.sh` - it will create a template file if it doesn't exist
45+
46+
2. Update `deploy_config.sh` with the config for your environment
47+
48+
3. Ensure the Databricks CLI is installed and configured on your machine. The "DEFAULT" profile should point to the destination workspace where the app will be deployed. You can find instructions here for [AWS](https://docs.databricks.com/dev-tools/cli/index.html) / [Azure](https://learn.microsoft.com/en-us/azure/databricks/dev-tools/cli/)
49+
50+
### Deploy the App
51+
1. Navigate to the app directory
52+
2. Run `./deploy.sh` shell command. This will sync the app code to the destination workspace location and deploy the app
53+
3. Navigate to the Databricks workspace and access the app via "Compute > Apps"
54+
55+
## 🔐 Access Control
56+
57+
### Catalog Allow-Listing
58+
59+
By default, the Feature Registry App will show all the catalogs to which the user has read access. You can restrict which Unity Catalog catalogs users can explore for features. This is useful for:
60+
- Limiting feature discovery to production-ready catalogs
61+
- Ensuring data scientists only access approved feature sets
62+
- Organizing features by teams or projects
63+
64+
#### Setting Up Allow-Listed Catalogs
65+
66+
1. Edit the `src/uc_catalogs_allowlist.yaml` file
67+
2. Uncomment and add the catalog names you want to allow:
68+
69+
```yaml
70+
# List catalogs that should be accessible in the Feature Registry App
71+
- production_features
72+
- team_a_catalog
73+
- ml_features_catalog
74+
```
75+
76+
3. If the file is empty or all entries are commented out, the app will show all catalogs available to the user
77+
4. Deploy the app with the updated configuration
78+
79+
**Note:** Users will still need appropriate permissions in Unity Catalog to access the data within these catalogs. The allow-list acts as an additional filter on top of existing permissions.
80+
2881
## 🔑 Requirements
2982
3083
The application requires the following scopes:
31-
- `catalog.catalogs`
32-
- `catalog.schemas`
33-
- `catalog.tables`
84+
- `catalog.catalogs:read`
85+
- `catalog.schemas:read`
86+
- `catalog.tables:read`
3487

35-
The app owner needs to grant other users `Can Use` permission for the app itself, along with the access to the underlying Datarbricks resources.
88+
The app owner needs to grant other users `Can Use` permission for the app itself, along with access to the underlying Databricks resources.

feature-registry-app/deploy.sh

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
echo_red() {
2+
echo "\033[1;31m$*\033[0m"
3+
}
4+
5+
# Validate the current folder
6+
[[ -d "./src" && -f "./src/app.yaml" ]] || { echo_red "Error: Couldn't find app.yaml. \nPlease run this script from the //sandbox/feature-registry-app directory."; exit 1; }
7+
8+
# Users: Make sure you have a ./deploy_config.sh file that sets the necessary variables for this script.
9+
[ -f "./deploy_config.sh" ] || {
10+
cat <<EOF > deploy_config.sh
11+
# Path to a folder in the workspace. E.g. /Workspace/Users/Path/To/App/Code
12+
export DEST=""
13+
# Name of the App to deploy. E.g. your-app-name
14+
export APP_NAME=""
15+
EOF
16+
echo_red "Please update deploy_config.sh and run again."
17+
exit 1;
18+
}
19+
source ./deploy_config.sh
20+
21+
databricks sync --full ./src $DEST
22+
databricks apps deploy $APP_NAME --source-code-path $DEST

feature-registry-app/pytest.ini

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
[pytest]
2+
testpaths = tests
3+
python_files = test_*.py
4+
python_classes = Test*
5+
python_functions = test_*
6+
7+
# Add src directory to Python path
8+
pythonpath = src
9+
10+
# Coverage settings
11+
[coverage:run]
12+
source = src
13+
omit =
14+
*/__pycache__/*
15+
*/tests/*
16+
17+
[coverage:report]
18+
exclude_lines =
19+
pragma: no cover
20+
def __repr__
21+
raise NotImplementedError
22+
if __name__ == .__main__.:
23+
pass
24+
raise ImportError

feature-registry-app/src/app.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
command: [
2+
"streamlit",
3+
"run",
4+
"feature_registry.py"
5+
]
6+
7+
env:
8+
- name: STREAMLIT_BROWSER_GATHER_USAGE_STATS
9+
value: "false"
10+
- name: UC_CATALOGS_ALLOWLIST # Set this to the path of the yaml file that contains the allow-listed UC catalogs. The Feature Registry App will restrict the search of features only to this list of catalogs.
11+
value: "uc_catalogs_allowlist.yaml"
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
from databricks.sdk import WorkspaceClient
2+
3+
4+
class UcClient:
5+
def __init__(self, user_access_token: str):
6+
self.w = WorkspaceClient(token=user_access_token, auth_type="pat")
7+
8+
def get_catalogs(self):
9+
return self.w.catalogs.list(include_browse=False)
10+
11+
def get_schemas(self, catalog_name: str):
12+
return self.w.schemas.list(catalog_name=catalog_name)
13+
14+
def get_tables(self, catalog_name: str, schema_name: str):
15+
return self.w.tables.list(catalog_name=catalog_name, schema_name=schema_name)
16+
17+
def get_table(self, full_name: str):
18+
return self.w.tables.get(full_name=full_name)
19+
20+
def get_functions(self, catalog_name: str, schema_name: str):
21+
return self.w.functions.list(catalog_name=catalog_name, schema_name=schema_name)
22+
23+
def get_function(self, full_name: str):
24+
return self.w.functions.get(name=full_name)

feature-registry-app/src/entities/__init__.py

Whitespace-only changes.
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
from typing import Any, Dict, List, Optional, Tuple
2+
3+
from pydantic import BaseModel
4+
5+
from .tables import Table
6+
7+
8+
class MaterializedInfo(BaseModel):
9+
schema_name: str
10+
table_name: str
11+
primary_keys: List[str]
12+
timeseries_columns: List[str]
13+
14+
15+
class Feature:
16+
def __init__(
17+
self, name: str, table: Table, pks: List[str], ts: Optional[List[str]] = None
18+
):
19+
self.name = name
20+
self.table = table
21+
self.pks = pks
22+
self.ts = ts or []
23+
24+
def get_materialized_info(self) -> MaterializedInfo:
25+
return MaterializedInfo(
26+
schema_name=self.table.schema(),
27+
table_name=self.table.name(),
28+
primary_keys=self.pks or [],
29+
timeseries_columns=self.ts or [],
30+
)
31+
32+
def description(self) -> str:
33+
for column in self.table.uc_table.columns:
34+
if column.name == self.name:
35+
return column.comment
36+
return ""
37+
38+
def components(self) -> Tuple[str, str, str]:
39+
return self.name, self.table.full_name(), ", ".join(self.pks)
40+
41+
def metadata(self) -> Dict[str, Any]:
42+
return {
43+
"Table Name": self.table.full_name(),
44+
"Primary Keys": self.pks,
45+
"Timeseries Columns": self.ts,
46+
"# of Features": len(self.table.uc_table.columns) - len(self.pks),
47+
"Table Type": self.table.uc_table.table_type.name,
48+
}
49+
50+
def inputs(self) -> Dict[str, str] | None:
51+
return None
52+
53+
def outputs(self) -> Dict[str, str] | None:
54+
return None
55+
56+
def code(self) -> str:
57+
return self.table.uc_table.view_definition
58+
59+
def table_name(self) -> str:
60+
return self.table.full_name()
61+
62+
def full_name(self) -> str:
63+
return f"{self.table.full_name()}.{self.name}"
64+
65+
66+
class SelectableFeature:
67+
def __init__(self, feature: Feature, selected: bool = False):
68+
self.feature = feature
69+
self.selected = selected
70+
71+
def components(self) -> Tuple[bool, str, str, str]:
72+
return (self.selected,) + self.feature.components()
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
from typing import Any, Dict, Tuple
2+
3+
from databricks import sdk
4+
from pydantic import BaseModel
5+
6+
7+
class FeatureFunction(BaseModel):
8+
function: sdk.service.catalog.FunctionInfo
9+
10+
def full_name(self) -> str:
11+
return self.function.full_name
12+
13+
def components(self) -> Tuple[str, str, Any, Any]:
14+
return self.full_name(), "feature spec", None, None
15+
16+
def metadata(self) -> Dict[str, Any] | None:
17+
return None
18+
19+
def inputs(self) -> Dict[str, str] | None:
20+
if self.function.input_params and self.function.input_params.parameters:
21+
return {p.name: p.type_text for p in self.function.input_params.parameters}
22+
return None
23+
24+
def outputs(self) -> Dict[str, str] | None:
25+
if self.function.return_params and self.function.return_params.parameters:
26+
return {p.name: p.type_text for p in self.function.return_params.parameters}
27+
return None
28+
29+
def code(self) -> str:
30+
return self.function.routine_definition
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
from typing import Tuple
2+
3+
from databricks import sdk
4+
5+
6+
class Table:
7+
def __init__(self, uc_table: sdk.service.catalog.TableInfo):
8+
self.uc_table = uc_table
9+
10+
def full_name(self) -> str:
11+
return self.uc_table.full_name
12+
13+
def name(self) -> str:
14+
return self.uc_table.name
15+
16+
def schema(self) -> str:
17+
return self.uc_table.schema_name
18+
19+
def components(self) -> Tuple[str, str, str]:
20+
return self.uc_table.catalog_name, self.uc_table.schema_name, self.uc_table.name

0 commit comments

Comments
 (0)