Data Extraction
The data extraction engine operates through two primary architectures: API-based integrations for centralized platforms and mobile-based SDKs for on-device health data. API sources automatically retrieve up to seven days of historical data, while mobile sources can extract up to thirty days depending on platform constraints. To maintain data integrity, the backend engineering team must implement deduplication logic that evaluates both the document_version and the datetime fields to determine if an incoming payload should overwrite existing database records.
Once user data authorization has been configured, we will proceed to extract the data. ROOK extracts health data from two primary types of sources:
- API-Based Data Sources: Platforms with centralized APIs, such as Fitbit, Garmin, and Polar.
- Mobile-Based Data Sources: Platforms like Android, Apple Health, and Health Connect, where data resides only on the user's device.
API-Based Extractions
For API-based sources, ROOK employs a combination of polling and webhook integration to retrieve data.
Workflow for API-Based Extractions
-
User Authorization:
- According to what is explained in Data Authentication, the users authorize access via a web interface or app view, configured using the
/authorizerendpoint.
- According to what is explained in Data Authentication, the users authorize access via a web interface or app view, configured using the
-
Data Retrieval:
- ROOK periodically queries the API (polling) and listens to webhooks for real-time updates.
- Redundant mechanisms ensure consistent data retrieval, even when one method is temporarily unavailable.
-
Data Delivery:
- Extracted data is processed and delivered via:
- ROOK Webhooks for real-time updates.
- ROOK API for on-demand queries.
- Extracted data is processed and delivered via:
Key Features
-
Pre-Existing Data:
- Upon authorization, ROOK retrieves up to 7 days of pre-existing data from supported sources. Refer to the Pre-Existing Data feature for details.
-
Custom Extraction Times:
- Default extraction times are 00:01 for physical summaries and 12:00 for sleep summaries (user's local time).
- Clients can request custom extraction times. ROOK uses the user's time zone to adjust scheduling. Learn more about the Time Zone Feature.
-
Retry Logic:
- If a summary is unavailable at the scheduled extraction time, ROOK retries extraction:
- Day 1: 23 attempts, one every hour.
- Next 30 Days: One attempt daily at the configured time.
- Successful extractions stop the retry process.
- If a summary is unavailable at the scheduled extraction time, ROOK retries extraction:
-
Duplication Handling:
- ROOK evaluates and updates duplicate summaries, sending the most complete version with an incremented
document_version. Learn more about the Duplication Feature.
- ROOK evaluates and updates duplicate summaries, sending the most complete version with an incremented
document_version identifies the most recent version of a dataset sent by ROOK, but it is only valid when it corresponds to the same datetime.
The logic is simple:
- If a higher version arrives and the datetime matches, replace the entire dataset.
- If the datetime does not match (for example, it belongs to a previous day), the dataset must not replace the current one, even if the version is higher.
- If a lower version arrives, it should be discarded.
For more information, refer to the document_version usage guide.
Mobile-Based Extractions
Mobile-based extractions rely on ROOK SDKs or the ROOK Extraction App to access health data directly from users' devices.
Workflow for Mobile-Based Extractions
-
User Authorization:
- Authorization is initiated via SDK popups in the client app or through the ROOK Extraction App.
-
Data Retrieval:
- SDKs extract data every hour, respecting source-specific limitations such as:
- App states (foreground or background).
- Device settings (e.g., locked screens).
- Request quotas or historical data limits.
- SDKs extract data every hour, respecting source-specific limitations such as:
-
Data Delivery:
- Extracted data is sent to ROOK servers for processing and delivered to clients via webhooks or the ROOK API.
- Certain metrics, such as step events, are available locally on the device and can be accessed directly via the SDK.
Key Features
-
Pre-Existing Data:
- Mobile-based sources provide up to 30 days of pre-existing data, depending on the platform’s restrictions.
-
Limitations:
- Data availability depends on platform-specific constraints, such as request limits and device states. Refer to the Data Sources section for details.
Comparing API and Mobile-Based Extractions
| Feature | API-Based Extractions | Mobile-Based Extractions |
|---|---|---|
| Use Case | API-based data sources | Data stored on mobile devices |
| Examples | Fitbit, Garmin, Polar, Oura | Apple Health, Health Connect, Android, iOS |
| Pre-Existing Data | Up to 7 days | Up to 29 days |
| Tools Provided by ROOK | Connections Page, /authorizer | SDKs, Extraction App |
To learn more about the providers that are APIs and SDKs, consult Data Sources.
Recommendations
- Both API and mobile-based extractions can be used to access a wide range of health data.
- ROOK’s retry logic and duplication handling enhance data reliability.
We recommend integrating both options. Far from being mutually exclusive, they complement each other by encompassing more data sources.
Further Reading
- Data Authorization: Learn about user authorization prerequisites.
- Data Delivery: Understand how data is sent to clients via API and webhooks.
- Data Sources: Explore source-specific capabilities and limitations.
- SDK Documentation: Learn about mobile data extraction using SDKs.