Technical Limitations¶
This document lists the known technical limitations of the fabricgov library, including API restrictions, performance considerations, and unsupported use cases.
๐ก API Limitations¶
Rate Limiting โ Power BI Admin APIs¶
Affected APIs:
- GET /admin/groups/{groupId}/users (WorkspaceAccessCollector)
- GET /admin/reports/{reportId}/users (ReportAccessCollector)
- GET /admin/datasets/{datasetId}/users (DatasetAccessCollector)
- GET /admin/dataflows/{dataflowId}/users (DataflowAccessCollector)
- GET /admin/datasets/{datasetId}/refreshes (RefreshHistoryCollector)
- GET /admin/dataflows/{dataflowId}/transactions (RefreshHistoryCollector)
Observed limit: ~200 requests/hour (not officially documented by Microsoft)
Behavior:
- After ~200 requests, the API returns 429 Too Many Requests
- The limit appears to be a sliding window, not a fixed 1-hour reset
- Pausing 30 seconds and retrying is not sufficient
- Requires a pause of ~1h30min to fully reset
Impact: - Small tenants (<200 workspaces/reports): no impact - Medium tenants (200โ1000): requires 2โ5 runs - Large tenants (1000+): requires multiple sessions over several hours
Implemented solution: - Automatic checkpoint system - Collection can be resumed across multiple runs - Scripts stop upon detecting rate limit (fail fast)
Time estimates: | Item count | Total time | Runs needed | |------------|------------|-------------| | 100 items | ~5 min | 1 | | 200 items | ~10 min | 1 | | 500 items | ~1h (with pauses) | 3 | | 1000 items | ~3โ5h (with pauses) | 5โ7 | | 2000 items | ~8โ12h (with pauses) | 10โ15 |
Personal Workspaces¶
Problem:
Personal Workspaces (format: "PersonalWorkspace Name (email)") do not support the following Admin APIs:
- GET /admin/groups/{groupId}/users
- GET /admin/reports/{reportId}/users
Observed behavior:
- Return 404 Not Found when attempting to fetch users
- In some cases, return 429 Too Many Requests (consuming rate limit unnecessarily)
Implemented solution: - WorkspaceAccessCollector automatically filters Personal Workspaces before making API calls - ReportAccessCollector automatically filters reports inside Personal Workspaces - Dramatically reduces unnecessary requests
Impact on corporate tenants: - Typical tenants have 30โ60% Personal Workspaces - Example: 302 total workspaces โ 186 Personal (62%) โ only 116 need to be collected
Admin Scan API โ WorkspaceInventoryCollector¶
Batching limit: 100 workspaces per scan request
Processing time: - Each scan takes ~5โ10 seconds - Large tenants (500+ workspaces) require multiple sequential scans
Data limitations: - The scan returns a snapshot at a point in time, not real-time data - Data may be slightly stale (seconds/minutes) - Scans do not return historical data or time-series metrics
Fields not returned by the scan: - Dataset refresh history - Detailed capacity consumption - Audit logs - Executed queries
๐ Permission Limitations¶
Service Principal¶
Required permissions: - Tenant.Read.All (Application permission) - Workspace.ReadWrite.All (Application permission) - Service Principal must be in the Fabric Administrators group
What a Service Principal CANNOT do: - Access workspaces/reports without explicit permission (even as Admin) - View dataset content (data, queries) - Execute DAX queries directly on datasets (requires user context) - Access APIs requiring delegated permissions (user context)
Note on Admin APIs: - Admin APIs allow listing and inspecting resources - They do NOT allow executing or modifying dataset/report content
Device Flow¶
Requirements: - The authenticating user must have the Fabric Administrator role in the tenant - MFA is supported automatically - Requires human interaction (cannot be automated)
Limitations: - Token expires in ~1 hour - Token cache is local (does not persist across machines) - Not recommended for CI/CD or automation
๐พ Checkpoint Limitations¶
Data size¶
Checkpoint stores: - List of processed IDs - Partial data collected up to that point
Potential issue in very large tenants: - Checkpoint files can grow to several MB - Example: 5,000 reports with 10 accesses each = ~50MB checkpoint - Loading/saving checkpoint may take a few seconds
Mitigation: - Checkpoint stores only IDs and partial data, it does not duplicate the inventory - Compact JSON format
Checkpoint invalidation¶
Checkpoint becomes invalid if:
- You re-run the inventory collection (new IDs are generated)
- Workspaces/reports are deleted between runs
- The structure of inventory_result changes
Symptoms: - Checkpoint is detected but no items are skipped - Collection processes items that appear duplicated
Solution:
- Manually delete the checkpoint: rm output/checkpoint_*.json
- Re-run the collection from scratch
๐ Performance Limitations¶
Inventory (WorkspaceInventoryCollector)¶
Expected performance: - ~100 workspaces: 5โ10 seconds - ~500 workspaces: 30โ60 seconds - ~1,000 workspaces: 1โ2 minutes
Main bottleneck: API scan time (not controllable)
Access Collectors¶
Expected performance (WITH checkpoint): - ~200 items: 3โ5 minutes - ~500 items: ~1h (with rate limit pauses) - ~1,000 items: ~3โ5h (with pauses)
Expected performance (WITHOUT checkpoint): - Not feasible for >200 items (terminal blocked for hours)
Export (FileExporter)¶
Expected performance: - JSON: fast up to 100MB - CSV: may be slow with large datasets (object flattening overhead)
CSV limitations: - Nested arrays become JSON strings (requires manual parsing) - Deeply nested objects generate long column names - Excel has a ~1M row limit
๐ซ Unsupported Features¶
Consumption metrics collection¶
Not implemented (out of scope): - CU consumption per workspace/dataset - Executed queries and query performance
Reason: - Requires access to the Capacity Metrics App dataset via DAX - fabricgov focuses on governance (permissions, inventory, refresh) โ not performance monitoring
Resource modification¶
fabricgov is READ-ONLY: - Does not modify workspaces, reports, or datasets - Does not create, delete, or change permissions - Does not execute refreshes or queries
Reason: focused on governance and assessment, not operational automation
Real-time collection¶
Limitations: - All data represents point-in-time snapshots - No streaming or WebSockets - No real-time change detection
Unsupported use cases: - Continuous monitoring - Real-time alerts - Live dashboards
Multi-tenancy¶
Current limitation: - Collects one tenant at a time - No support for aggregating data from multiple tenants - Service Principal is tenant-specific
Workaround: - Run collection separately for each tenant - Aggregate results manually after export
๐ Known Issues¶
Issue #1: Checkpoint not detected after a long timeout¶
Scenario: - Checkpoint saved - Wait >24 hours - Next run does not detect the checkpoint
Cause: inventory_result.json may be outdated
Solution: - Re-run inventory collection - Delete old checkpoints before resuming
Issue #2: Special characters in workspace/report names¶
Scenario: - Workspaces/reports with emojis or rare unicode characters - CSV may not render correctly in Excel
Solution: - Use JSON format instead of CSV - Or import the CSV with explicit UTF-8 encoding
Issue #3: Service Principal without permissions returns a generic error¶
Scenario:
- SP is not in the Fabric Administrators group
- Error returned: 403 Forbidden with a generic message
Solution: - Validate permissions following docs/en/authentication.md - Wait up to 15 minutes after adding to the group (permission propagation)
๐ Microsoft-Documented Limitations¶
Admin APIs may change without notice¶
Microsoft does not guarantee: - Stability of Admin APIs (may change at any time) - Backward compatibility for schema changes - SLA availability for Admin APIs
Impact: - fabricgov may break after Microsoft updates - Always test in a non-production environment first
Dynamic throttling¶
Microsoft may dynamically adjust limits: - Rate limits may vary by tenant - Peak hours may have more aggressive limits - Tenants with a history of abuse may have reduced limits
Impact: - Checkpoint timing may vary between runs - 1h30min pauses may not be sufficient in some cases
๐ฎ Planned Limitation Removals¶
v0.7.0¶
- Automatically generated HTML report from collected data
v0.8.0¶
fabricgov analyzeโ automatic governance findings (datasets without owners, external users, workspaces without refresh)
v0.9.0¶
- Azure Key Vault integration for credential management
fabricgov diffโ comparison between snapshots from different collection runs
v1.0.0¶
- Full documentation via MkDocs
๐ก Workarounds and Solutions¶
For very large tenants (2000+ items)¶
Option 1: Scheduled collection - Configure a cron job or Task Scheduler - Run overnight - Results available in the morning
Option 2: Distributed collection - Use multiple Service Principals (not officially documented/supported) - Each SP collects a subset of workspaces - Aggregate results manually
๐ Reporting Limitations¶
If you find an undocumented limitation:
- Check whether it is already listed in this document
- Open an Issue on GitHub
- Include:
- Description of the limitation
- Environment (tenant size, collection type)
- Full output/error
- Steps to reproduce