Spare Platform Service Disruption

Incident Report for Spare Platform

Postmortem

Spare Incident Report

February 4, 2026

Executive Summary

On February 4, 2026, the Spare platform experienced service disruptions primarily impacting customers served by our CA region infrastructure. Users experienced two distinct windows of instability characterized by system slowness, operation timeouts, and intermittent error messages.

The disruption was caused by database resource contention following a scheduled infrastructure adjustment, compounded by a "cold cache" state and stale query optimization statistics that led to connection pool exhaustion. Our engineering team addressed the issues through immediate capacity increases, manual optimization of database statistics, and temporary suspension of non-essential background tasks to prioritize core service stability. Service was fully restored to all affected organizations by 11:14 am PT.

Incident Details

The incident occurred in two phases. The first began at 7:20 am PT when monitoring systems alerted the team to severe connection pool exhaustion in one of our database clusters. A second period of instability occurred at 10:16 am PT as the system struggled to optimize queries under peak morning load.

Cause

The primary root cause was the exhaustion of the database connection pool, which prevented backend services from processing queries.

Our detailed investigation identified two contributing factors:

Resource Contention: A database scale-down operation performed the previous evening resulted in insufficient capacity to handle the surge in morning traffic.
Stale Query Statistics: Following the necessary scale-up to restore capacity, the database query planner lacked updated statistics (specifically requiring a manual ANALYZE operation). This caused the system to use inefficient query plans, leading to high disk I/O and further instances of connection pool exhaustion.

‌

Mitigation and Resolution

Our team’s response focused on immediate capacity restoration and system stabilization:

Capacity Increase: Engineering immediately scaled the CA database back to its previous capacity tier and increased the number of API pods to distribute the load more effectively.
Query Optimization: To resolve the second wave of instability, the team manually executed ANALYZE commands across the database clusters to refresh statistics and restore efficient query processing.
Workload Management: We temporarily disabled non-critical background jobs during the incident to reduce database pressure.
Service Restoration: Full service was confirmed for all organizations served from our CA region by 11:14 am PT.

Timeline

All times are in Pacific Time (PT).

07:20 am: Monitoring detects connection pool exhaustion; Incident Team responds.
07:26 am: Team initiates CA database scale-up to restore capacity.
08:00 am: API pod health checks temporarily removed to reduce database load during recovery.
09:21 am: Database scale-up completes; initial service restoration achieved.
10:16 am: A second wave of high error rates is detected as morning load peaks.
11:03 am: Non-essential background jobs are disabled to prioritize core platform stability.
11:14 am: Manual database optimization (ANALYZE) completed; full service restored to all organizations.
12:52 pm: Background jobs re-enabled after confirming sustained system stability.

Next Steps

To ensure continued reliability and prevent a recurrence of this resource contention, we are undertaking the following:

Infrastructure Configuration: We are updating our infrastructure-as-code (Terraform) configurations to prevent automatic down-scaling of production-critical database tiers, and to more appropriately schedule database statistic refreshes (via ANALYZE operation).
Operational Procedures: We are implementing a mandatory requirement to refresh database statistics following any manual or automated scaling event.
Performance Monitoring: We are enhancing our alerts for "cold cache" symptoms and disk I/O wait times to detect potential bottlenecks before they result in connection exhaustion.

We understand that this disruption was more than just a technical failure; it was a breakdown in the service you and your riders depend on every day. We are deeply sorry for the immense frustration and operational strain this caused your teams. Reliability is the foundation of our partnership, and we fell short of that standard today. We are working with absolute urgency to implement the safeguards required to deliver the level of uptime performance you expect.

Posted Feb 04, 2026 - 21:44 PST

Resolved

The platform loading and performance issues have been resolved. Our engineering team has confirmed system stability following continued monitoring after the fix applied earlier.

At this time, all services are operating normally. If you enacted your Business Continuity Plan (BCP) during the incident, you may now fully transition back to normal operations.

Your PSM will follow up directly to coordinate a post-incident review and share additional context, including findings and next steps from this incident.

For reference, here are the relevant resources:

Business Continuity Plan (BCP): https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25

Disaster Recovery – Emergency Response Guide: https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view

Setting up Disaster Recovery Backups: https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups

Using Dispatch Backup Files: https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

Thank you for your patience and partnership while we worked to resolve this issue.

Posted Feb 04, 2026 - 10:06 PST

Update

A fix has been applied to address the platform loading and performance issues. As of 9:21 PM PST, our engineering team is actively monitoring the platform to confirm full recovery and stability.

Some organizations may continue to experience intermittent issues while monitoring is ongoing. If you or your team encounter any problems, we recommend the following steps:

Perform a hard refresh in your browserClose and reopen the Spare Driver app on affected devices

If you enacted your Business Continuity Plan (BCP) during the incident, you may begin transitioning back to normal operations as appropriate while monitoring continues.

Here are some helpful resources for reference:

Business Continuity Plan (BCP): https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25Disaster Recovery – Emergency Response Guide: https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/viewSetting up Disaster Recovery Backups: https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backupsUsing Dispatch Backup Files: https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update once monitoring is complete or if additional information becomes available. Thank you for your patience.

Posted Feb 04, 2026 - 09:51 PST

Update

A fix has been applied to address the platform loading and performance issues. As of 9:21 PM PST, our engineering team is actively monitoring the platform to confirm full recovery and stability.

Some organizations may continue to experience intermittent issues while monitoring is ongoing. If you or your team encounter any problems, we recommend the following steps:

- Perform a hard refresh in your browser
- Close and reopen the Spare Driver app on affected devices

If you enacted your Business Continuity Plan (BCP) during the incident, you may begin transitioning back to normal operations as appropriate while monitoring continues.

Here are some helpful resources for reference:

Business Continuity Plan (BCP): https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25Disaster Recovery – Emergency Response Guide: https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/viewSetting up Disaster Recovery Backups: https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backupsUsing Dispatch Backup Files: https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update once monitoring is complete or if additional information becomes available. Thank you for your patience.

Posted Feb 04, 2026 - 09:36 PST

Monitoring

Our engineering team deployed additional remediation steps to address the platform loading and performance issues reported earlier. As of 09:21 PT, the additional database capacity increase in the CA region has completed, and we are seeing a return to normal service.

The team is continuing to actively monitor the platform and investigate the underlying cause to ensure continued stability. At this time, the impact remains resolved in the CA region, and US1 and US2 are operating normally.

Some or all of your Spare Driver applications may have been impacted earlier today. If you enacted your Business Continuity Plan during the incident, you may begin transitioning back to normal operations as appropriate.

Here are some helpful links for reference:
• Business Continuity Plan (BCP) – https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25
• Disaster Recovery – Emergency Response Guide – https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view
• Setting up Disaster Recovery Backups – https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups
• Using Dispatch Backup Files – https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll continue to monitor closely and will provide another update if anything changes. Thank you for your patience while we worked through this.

Posted Feb 04, 2026 - 09:26 PST

Update

Posted Feb 04, 2026 - 09:21 PST

Update

Our engineering team deployed a fix intended to address the platform loading and performance issues reported earlier. However, we are still seeing an ongoing impact, and the issue has not been fully resolved at this time.

The team is continuing to actively investigate and is working on additional remediation steps to restore full platform stability. At this time, the impact remains limited to the CA region. US1 and US2 are operating normally.

Some or all of your Spare Driver applications in the CA region may be impacted by this incident. Please validate with your drivers/operators whether or not they can continue to use the Spare Driver app. If they cannot, please enact your Spare Business Continuity Plan.

Here are some helpful links to get started:
• Business Continuity Plan (BCP) – https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25
• Disaster Recovery – Emergency Response Guide – https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view
• Setting up Disaster Recovery Backups – https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups
• Using Dispatch Backup Files – https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update as soon as more information becomes available. Thank you for your continued patience.

Posted Feb 04, 2026 - 09:10 PST

Investigating

Our engineering team deployed a fix intended to address the platform loading and performance issues reported earlier. However, we are still seeing an ongoing impact, and the issue has not been fully resolved at this time.

The team is continuing to actively investigate and is working on additional remediation steps to restore full platform stability.

Some or all of your Spare Driver applications may be impacted by this incident. Please validate with your drivers/operators whether or not they can continue to use the Spare Driver app. If they cannot, please enact your Spare Business Continuity Plan.

Here are some helpful links to get started:

* Business Continuity Plan (BCP) - (https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25)

* Disaster Recovery - Emergency Response Guide - (https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view)

* Setting up Disaster Recovery Backups - (https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups)

* Using Dispatch Backup Files - (https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files)

We’ll provide another update as soon as more information becomes available. Thank you for your continued patience.

Posted Feb 04, 2026 - 09:01 PST

Update

We are continuing to work on a fix for this issue.

Posted Feb 04, 2026 - 08:44 PST

Update

We are maintaining oversight of the issue and anticipate further updates shortly - Our engineering team has idenitfied a fix to address the platform loading and performance issues reported earlier. The fix is currently rolling out across the platform, and we are closely monitoring its progress.

Some or all of your Spare Driver applications may be impacted by this incident. Please validate with your drivers/operators whether or not they can continue to use the Spare Driver app. If they cannot, please enact your Spare Business Continuity Plan.

Here are some helpful links to get started:

* Business Continuity Plan (BCP) - (https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25)

* Disaster Recovery - Emergency Response Guide - (https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view)

* Setting up Disaster Recovery Backups - (https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups)

* Using Dispatch Backup Files - (https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files)

We will provide another update once the rollout has completed and we can confirm platform stability. Thank you for your continued patience.

Posted Feb 04, 2026 - 08:35 PST

Update

Our engineering team has idenitfied a fix to address the platform loading and performance issues reported earlier. The fix is currently rolling out across the platform, and we are closely monitoring its progress.

Some or all of your Spare Driver applications may be impacted by this incident. Please validate with your drivers/operators whether or not they can continue to use the Spare Driver app. If they cannot, please enact your Spare Business Continuity Plan.

Here are some helpful links to get started:

* Business Continuity Plan (BCP) - (https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25)

* Disaster Recovery - Emergency Response Guide - (https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view)

* Setting up Disaster Recovery Backups - (https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups)

* Using Dispatch Backup Files - (https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files)

We will provide another update once the rollout has completed and we can confirm platform stability. Thank you for your continued patience.

Posted Feb 04, 2026 - 08:19 PST

Identified

Posted Feb 04, 2026 - 08:14 PST

Update

Our team is continuing to investigate the platform loading and performance issues reported earlier. At this time, we don’t have additional updates to share, but our engineering team remains actively engaged.

Some or all of your Spare Driver applications may be impacted by this incident. Please validate with your drivers/operators whether or not they can continue to use the Spare Driver app. If they cannot, please enact your Spare Business Continuity Plan.

Here are some helpful links to get started:

* Business Continuity Plan (BCP) - (https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25)

* Disaster Recovery - Emergency Response Guide - (https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view)

* Setting up Disaster Recovery Backups - (https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups)

* Using Dispatch Backup Files - (https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files)

We’ll provide another update as soon as more information becomes available. Thank you for your continued patience.

Posted Feb 04, 2026 - 07:53 PST

Update

Posted Feb 04, 2026 - 07:52 PST

Update

We are continuing to investigate this issue.

Posted Feb 04, 2026 - 07:48 PST

Investigating

We are currently investigating an issue impacting platform loading and performance for some users on the CA (Canada) region. Our engineering team is actively working to identify the root cause and restore normal service as quickly as possible.

Some or all of your Spare Driver applications may be impacted by this incident. Please validate with your drivers/operators whether or not they can continue to use the Spare Driver app. If they cannot, please enact your Spare Business Continuity Plan.

Here are some helpful links to get started:

* Business Continuity Plan (BCP) - (https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25)

* Disaster Recovery - Emergency Response Guide - (https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view)

* Setting up Disaster Recovery Backups - (https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups)

* Using Dispatch Backup Files - (https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files)

We’ll provide updates here as we learn more. Thanks for your patience.

Posted Feb 04, 2026 - 07:35 PST

This incident affected: Spare Driver Application ([iOS], [Android]), Administrator Portal ([Canada]), and AI-Voice.