Exporters Guide¶
Exporters transform collector results into structured files. FileExporter is the main exporter, supporting JSON and CSV formats.
π FileExporter¶
Overview¶
FileExporter creates a timestamped folder structure, ensuring multiple runs don't overwrite previous results.
Output structure:
output/
βββ 20260219_120000/
β βββ log.txt
β βββ summary.json
β βββ workspaces.csv (or .json)
β βββ reports.csv
β βββ datasets.csv
β βββ ...
βββ 20260219_150000/
βββ ...
π Basic Usage¶
JSON¶
from fabricgov.exporters import FileExporter
exporter = FileExporter(format="json", output_dir="output")
output_path = exporter.export(result, log_messages)
print(f"β Files exported to: {output_path}")
CSV¶
exporter = FileExporter(format="csv", output_dir="output")
output_path = exporter.export(result, log_messages)
π Parameters¶
FileExporter(
format: Literal["json", "csv"] = "json",
output_dir: str = "output",
run_dir: str | None = None
)
| Parameter | Type | Description | Default |
|---|---|---|---|
format |
"json" or "csv" |
Export format | "json" |
output_dir |
str |
Root directory where timestamped folders are created | "output" |
run_dir |
str\|None |
Direct destination folder (no timestamp created). Used by collect all to keep all steps in a single folder |
None |
When to use run_dir:
- Normal CLI usage: leave unset β output/YYYYMMDD_HHMMSS/ is created automatically
- fabricgov collect all: managed internally to ensure all steps write to the same folder
π File Structure¶
Always created¶
log.txt¶
Full execution log with progress, summary, and artifact counts.
summary.json¶
Always in JSON format, regardless of the chosen format.
{
"total_workspaces": 302,
"total_items": 1367,
"items_by_type": {
"reports": 777,
"datasets": 506
},
"scan_duration_seconds": 23.82,
"batches_processed": 4
}
Conditional files¶
One file per artifact type (only created if count > 0):
- workspaces.json / workspaces.csv
- reports.json / reports.csv
- datasets.json / datasets.csv
- workspace_access.json / workspace_access.csv
- refresh_history.json / refresh_history.csv
- domains.json / domains.csv
- capacities.json / capacities.csv
- ... (all collected types)
π JSON Format¶
- β Hierarchical structure preserved
- β Nested arrays and objects maintained
- β UTF-8 encoding
- β Pretty-printed (2-space indentation)
- β Easy to import in Python, Power BI, etc.
π CSV Format¶
- β Compatible with Excel, Power BI, Pandas
- β
Nested objects are flattened (e.g.,
user.nameβuser_name) - β Arrays are converted to JSON strings
- β UTF-8 encoding
- β Column headers included
Flattening Example¶
Original JSON:
{
"id": "dataset-123",
"sensitivityLabel": {
"labelId": "label-456",
"labelName": "Confidential"
}
}
Flattened CSV:
Reading in Pandas:
π Advanced Usage¶
Multiple collectors in the same run¶
from fabricgov.auth import ServicePrincipalAuth
from fabricgov.collectors import WorkspaceInventoryCollector, WorkspaceAccessCollector
from fabricgov.exporters import FileExporter
auth = ServicePrincipalAuth.from_env()
log_messages = []
def progress(msg):
from datetime import datetime
ts = f"[{datetime.now().strftime('%H:%M:%S')}] {msg}"
print(ts)
log_messages.append(ts)
inventory_result = WorkspaceInventoryCollector(auth=auth, progress_callback=progress).collect()
access_result = WorkspaceAccessCollector(auth=auth, inventory_result=inventory_result).collect()
exporter = FileExporter(format="csv", output_dir="output")
exporter.export(inventory_result, log_messages)
exporter.export(access_result, [])
π¦ Integration with Other Tools¶
Power BI Desktop¶
Import CSV:
- Open Power BI Desktop β Get Data β Text/CSV
- Select workspaces.csv, reports.csv, etc.
- Relate via workspace_id
Import JSON: - Get Data β JSON β Transform Data β Expand columns
Python / Pandas¶
import pandas as pd
from pathlib import Path
output_dir = Path("output/20260219_120000")
workspaces = pd.read_csv(output_dir / "workspaces.csv")
datasets = pd.read_csv(output_dir / "datasets.csv")
df = datasets.merge(
workspaces[['id', 'name', 'capacityId']],
left_on='workspace_id',
right_on='id',
suffixes=('_dataset', '_workspace')
)
Azure Data Lake / Blob Storage¶
from azure.storage.blob import BlobServiceClient
from pathlib import Path
blob_service = BlobServiceClient.from_connection_string(conn_str)
container_client = blob_service.get_container_client("governance")
output_dir = Path("output/20260219_120000")
for file_path in output_dir.glob("*"):
blob_client = container_client.get_blob_client(file_path.name)
with open(file_path, "rb") as data:
blob_client.upload_blob(data, overwrite=True)
SQL Database¶
import sqlite3
import pandas as pd
from pathlib import Path
conn = sqlite3.connect("governance.db")
output_dir = Path("output/20260219_120000")
for csv_file in output_dir.glob("*.csv"):
table_name = csv_file.stem
df = pd.read_csv(csv_file)
df.to_sql(table_name, conn, if_exists="replace", index=False)
conn.close()
print("β Data inserted into SQLite database")
π§ Known Limitations¶
- CSV with nested arrays: Arrays are converted to JSON strings β requires manual parsing
- Long column names: Deeply nested objects generate names like
extendedProperties_DwProperties_endpoint - File size: Large tenants (1000+ workspaces) may generate 50β100MB files