Spare Platform Service Disruption

Incident Report for Spare Platform

Postmortem

Spare Incident Report

February 4, 2026

Executive Summary

On February 4, 2026, the Spare platform experienced service disruptions primarily impacting customers served by our CA region infrastructure. Users experienced two distinct windows of instability characterized by system slowness, operation timeouts, and intermittent error messages.

The disruption was caused by database resource contention following a scheduled infrastructure adjustment, compounded by a "cold cache" state and stale query optimization statistics that led to connection pool exhaustion. Our engineering team addressed the issues through immediate capacity increases, manual optimization of database statistics, and temporary suspension of non-essential background tasks to prioritize core service stability. Service was fully restored to all affected organizations by 11:14 am PT.

Incident Details

The incident occurred in two phases. The first began at 7:20 am PT when monitoring systems alerted the team to severe connection pool exhaustion in one of our database clusters. A second period of instability occurred at 10:16 am PT as the system struggled to optimize queries under peak morning load.

Cause

The primary root cause was the exhaustion of the database connection pool, which prevented backend services from processing queries.

Our detailed investigation identified two contributing factors:

  • Resource Contention: A database scale-down operation performed the previous evening resulted in insufficient capacity to handle the surge in morning traffic.
  • Stale Query Statistics: Following the necessary scale-up to restore capacity, the database query planner lacked updated statistics (specifically requiring a manual ANALYZE operation). This caused the system to use inefficient query plans, leading to high disk I/O and further instances of connection pool exhaustion.

Mitigation and Resolution

Our team’s response focused on immediate capacity restoration and system stabilization:

  • Capacity Increase: Engineering immediately scaled the CA database back to its previous capacity tier and increased the number of API pods to distribute the load more effectively.
  • Query Optimization: To resolve the second wave of instability, the team manually executed ANALYZE commands across the database clusters to refresh statistics and restore efficient query processing.
  • Workload Management: We temporarily disabled non-critical background jobs during the incident to reduce database pressure.
  • Service Restoration: Full service was confirmed for all organizations served from our CA region by 11:14 am PT.

Timeline

All times are in Pacific Time (PT).

  • 07:20 am: Monitoring detects connection pool exhaustion; Incident Team responds.
  • 07:26 am: Team initiates CA database scale-up to restore capacity.
  • 08:00 am: API pod health checks temporarily removed to reduce database load during recovery.
  • 09:21 am: Database scale-up completes; initial service restoration achieved.
  • 10:16 am: A second wave of high error rates is detected as morning load peaks.
  • 11:03 am: Non-essential background jobs are disabled to prioritize core platform stability.
  • 11:14 am: Manual database optimization (ANALYZE) completed; full service restored to all organizations.
  • 12:52 pm: Background jobs re-enabled after confirming sustained system stability.

Next Steps

To ensure continued reliability and prevent a recurrence of this resource contention, we are undertaking the following:

  • Infrastructure Configuration: We are updating our infrastructure-as-code (Terraform) configurations to prevent automatic down-scaling of production-critical database tiers, and to more appropriately schedule database statistic refreshes (via ANALYZE operation).
  • Operational Procedures: We are implementing a mandatory requirement to refresh database statistics following any manual or automated scaling event.
  • Performance Monitoring: We are enhancing our alerts for "cold cache" symptoms and disk I/O wait times to detect potential bottlenecks before they result in connection exhaustion.

We understand that this disruption was more than just a technical failure; it was a breakdown in the service you and your riders depend on every day. We are deeply sorry for the immense frustration and operational strain this caused your teams. Reliability is the foundation of our partnership, and we fell short of that standard today. We are working with absolute urgency to implement the safeguards required to deliver the level of uptime performance you expect.

Posted Feb 04, 2026 - 21:42 PST

Resolved

The platform loading and performance issues have been resolved as of 11:22 AM Pacific Time. Our engineering team has confirmed service recovery, and the platform is operating normally at this time.

We will continue to monitor system performance closely to ensure ongoing stability.

If you enacted your Business Continuity Plan (BCP) during the incident, you may now fully transition back to normal operations.

Your Partner Success Manager (PSM) will follow up to coordinate a post-incident review and share additional context as needed.

Thank you for your patience and understanding while we worked through this issue.
Posted Feb 04, 2026 - 12:06 PST

Update

The platform remains under active monitoring as our engineering team continues to investigate the loading and performance issues reported earlier.

At this time, there are no additional updates to share. We are closely tracking system behavior and will communicate as soon as there is new information or a change in status.

Some organizations may continue to experience intermittent issues. If you or your team are impacted, please validate whether drivers and operators can continue using the Spare Driver app. If not, we recommend continuing to follow your Spare Business Continuity Plan (BCP).

Helpful resources for reference:

• Business Continuity Plan (BCP):
https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25

• Disaster Recovery – Emergency Response Guide:
https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view

• Setting up Disaster Recovery Backups:
https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups

• Using Dispatch Backup Files:
https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update as soon as more information becomes available. Thank you for your continued patience.
Posted Feb 04, 2026 - 11:54 PST

Update

The platform remains under active monitoring as our engineering team continues to investigate the loading and performance issues reported earlier.

At this time, there are no additional updates to share. We are closely tracking system behavior and will communicate as soon as there is new information or a change in status.

Some organizations may continue to experience intermittent issues. If you or your team are impacted, please validate whether drivers and operators can continue using the Spare Driver app. If not, we recommend continuing to follow your Spare Business Continuity Plan (BCP).

Helpful resources for reference:

• Business Continuity Plan (BCP):
https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25

• Disaster Recovery – Emergency Response Guide:
https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view

• Setting up Disaster Recovery Backups:
https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups

• Using Dispatch Backup Files:
https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update as soon as more information becomes available. Thank you for your continued patience.
Posted Feb 04, 2026 - 11:39 PST

Monitoring

A fix has been applied (at 11:22 PST) to address the platform loading and performance issues reported earlier. Our engineering team is actively monitoring the platform to confirm full recovery and stability.

Some organizations may continue to experience intermittent issues while monitoring is ongoing. If you or your team encounter any problems, we recommend the following steps:

• Perform a hard refresh in your browser
• Close and reopen the Spare Driver app on affected devices

If you enacted your Business Continuity Plan (BCP) during the incident, you may begin transitioning back to normal operations as appropriate while monitoring continues.

Helpful resources for reference:

• Business Continuity Plan (BCP):
https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25

• Disaster Recovery – Emergency Response Guide:
https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view

• Setting up Disaster Recovery Backups:
https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups

• Using Dispatch Backup Files:
https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update once monitoring is complete or if additional information becomes available. Thank you for your patience.
Posted Feb 04, 2026 - 11:24 PST

Update

At this time, there are no new updates to share. Our engineering team continues to actively investigate the platform loading and performance issues and is monitoring system behavior closely.

Some organizations may continue to experience intermittent issues while this investigation is ongoing.

If your team is unable to operate normally, please confirm whether drivers and operators can continue using the Spare Driver app. If not, we recommend enacting your Business Continuity Plan (BCP).

Helpful resources for reference:

• Business Continuity Plan (BCP):
https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25

• Disaster Recovery – Emergency Response Guide:
https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view

• Setting up Disaster Recovery Backups:
https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups

• Using Dispatch Backup Files:
https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update as soon as additional information becomes available. Thank you for your continued patience.
Posted Feb 04, 2026 - 11:09 PST

Update

At this time, there are no new updates to share. Our engineering team continues to actively investigate the platform loading and performance issues and is monitoring system behavior closely.

Some organizations may continue to experience intermittent issues while this investigation is ongoing.

If your team is unable to operate normally, please confirm whether drivers and operators can continue using the Spare Driver app. If not, we recommend enacting your Business Continuity Plan (BCP).

Helpful resources for reference:

• Business Continuity Plan (BCP):
https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25

• Disaster Recovery – Emergency Response Guide:
https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view

• Setting up Disaster Recovery Backups:
https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups

• Using Dispatch Backup Files:
https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update as soon as additional information becomes available. Thank you for your continued patience.
Posted Feb 04, 2026 - 10:54 PST

Update

At this time, there are no new updates to share. Our engineering team continues to actively investigate the platform loading and performance issues and is monitoring system behavior closely.

Some organizations may continue to experience intermittent issues while this investigation is ongoing.

If your team is unable to operate normally, please confirm whether drivers and operators can continue using the Spare Driver app. If not, we recommend enacting your Business Continuity Plan (BCP).

Helpful resources for reference:

• Business Continuity Plan (BCP):
https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25

• Disaster Recovery – Emergency Response Guide:
https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view

• Setting up Disaster Recovery Backups:
https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups

• Using Dispatch Backup Files:
https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update as soon as additional information becomes available. Thank you for your continued patience.
Posted Feb 04, 2026 - 10:38 PST

Identified

We are seeing a recurrence of the platform loading and performance issues after the earlier recovery. Our engineering team is actively investigating and working to stabilize the platform.

Some organizations may experience slow loading, intermittent errors, or difficulty accessing parts of the platform during this time.

If your team is unable to operate normally, please validate whether drivers and operators can continue using the Spare Driver app. If they cannot, we recommend enacting your Spare Business Continuity Plan (BCP).

Helpful resources:
• Business Continuity Plan (BCP):
https://www.notion.so/189e9a7769a6802185f1e65cc30d5321?pvs=25

• Disaster Recovery – Emergency Response Guide:
https://drive.google.com/file/d/1KXtfZy70AGhM48QElJGMvxnwqMAV3Shx/view

• Setting up Disaster Recovery Backups:
https://help.sparelabs.com/en/articles/10134588-setting-up-disaster-recovery-backups

• Using Dispatch Backup Files:
https://help.sparelabs.com/en/articles/10263338-using-dispatch-backup-files

We’ll provide another update as soon as we have more information to share. Thank you for your continued patience.
Posted Feb 04, 2026 - 10:24 PST
This incident affected: Spare Driver Application ([iOS], [Android]), Administrator Portal ([Canada]), and AI-Voice.