How Cloud Sync duplicated files across remotes and broke versioning and the conflict-resolution rules I implemented to reconcile copies

gordan

1 month ago

Cloud synchronization tools are widely celebrated for their convenience, enabling seamless access to files across devices and platforms. Used correctly, they simplify our digital lives and enhance collaboration. However, when multiple systems are involved—different remotes, automated scripts, and version-controlled environments—things can go sideways. My custom conflict-resolution rules and versioning logic were once robust enough to enable distributed workflows, until cloud sync workflows intervened in unpredictable and destructive ways.

TLDR:

Using cloud sync services across multiple remotes led to inadvertent file duplication, breaking my file versioning system and skirted my manually-implemented conflict-resolution rules. The core issue arose from automated sync behavior misinterpreting file changes, creating ghost duplicates that shadowed legitimate versions. In the end, efforts to reconcile differences across sources made conflicts worse. This breakdown exposed major risks in trusting third-party sync services without tight integration into custom file workflows.

Understanding the Initial Architecture

To contextualize what went wrong, it’s important to first understand how the system was originally structured. My environment consisted of three key components:

A local development environment on my daily workstation.
A cloud-sync provider (in this case, Dropbox and later Resilio Sync) to keep files in sync across devices.
A conflict-resolution and versioning system implemented using file naming conventions and metadata tracking—version history stored as text-based changelogs and hash records.

Files were designed to sync across remotes without triggering false conflicts. I maintained a strict timestamp and checksum verification framework, where every version-controlled file stored changes explicitly, and scripts checked file integrity before overwriting. My system assumed that time-based modification and MD5 hashing would prevent file duplication and version corruption.

What Cloud Sync Got Wrong

On paper, cloud sync services honor the idea of file syncing: they compare metadata, look at timestamps and file sizes, then mirror changes to all connected devices. But problems begin when:

Time zones are interpreted inconsistently (UTC vs. local time).
Dropped connections cause partial updates or redundant uploads.
Files edited even marginally differently on two remotes simultaneously are interpreted as “conflicting.”

In my case, these seemingly minor misinterpretations caused a cascade of failures:

Sync programs saw two files—identical in content but milliseconds apart in mod-time—as different.
These “conflicts” were resolved by automatically duplicating the files as filename (conflicted copy) rather than honoring my hash-tested concat script.
Each duplication triggered an update across all devices, which in turn created new versions of the same conflicted content in other locations.

The very safeguards meant to prevent conflict were blindsided by the simplistic logic of cloud sync services.

Versioning, Broken and Obsolete

My versioning system relied on consistent naming and structured hashes. Because every “official” change required an incremented suffix (for example, document_v4.txt) and matching checksum log, duplication from syncing broke that system in the following ways:

Cloud-created duplicate files did not follow naming conventions, e.g.,
document (Conflicted copy from laptop 01).txt
The script that matched suffixes to versions ignored those unregistered files.
Outdated but technically newer timestamped files overwrote manually validated content.

Even more critically, updates auto-merged across devices with unsynchronized clock settings altered metadata like “last modified” times, which ruined chronological alignment. At scale, this meant older files replaced or co-existed with newer content indiscriminately.

The Conflict-Resolution Rules That Failed

I’d implemented a simple but effective set of rules to resolve conflicts across environments, which included the following principles:

Never overwrite a file unless its hash or suffix number changes.
If a file has been altered, store the prior version in a changelog directory before committing the new one.
Manually verified conflicts must be resolved before merging to central storage.

Unfortunately, cloud sync operated outside my rules:

It did not recognize file naming conventions as indicators of priority or version.
It failed to preserve file hash integrity during renames and duplicates.
It did not preserve my metadata sidecar files alongside their primary file counterparts.

In a more sophisticated environment, such as with git or other full-fledged version control systems, this would have triggered atomic reverts or flagged alerts. The inherently non-transactional model of cloud sync made silent merging the default behavior. My resolution system had no way to detect—let alone prioritize—these changes.

The Domino Effect Across Remotes

Because files synced in under 30 seconds and my automation loop re-scanned files every 2 minutes, it didn’t take long before redundant files spread across all environments. On some remotes, the filename structure was preserved but the file content differed, causing the diff algorithm to collapse. It assumed “minor changes” and skipped hash validation altogether.

Worse still, because each sync action was interpreted as user-intent, my devices kept uploading “corrective copies” back to the cloud, digressing farther from the true baseline file state. Manual intervention only increased the rates of divergence.

Where Automation Undermines Trust

I had designed the workflow with automation in mind—trusting that consistent rulesets and hashing would keep order. But trusting sync services to honor the rules was naïve. They prioritized file throughput and low latency over semantic consistency. This atmosphere of automated workflow, compounded by cloud autonomy, undermined:

Accountability — I couldn’t determine which file was actually the authoritative version.
Traceability — File histories were polluted with conflicting branch histories outside my control.
Recoverability — Once versioning broke, even backups propagated the inconsistencies.

At one point, my control scripts reported over 800 “valid” file versions across just 100 actual documents. Parsing out which were valid, authoritative, or merely conflicted ghost files took weeks of clean-up work.

Lessons Learned and Final Thoughts

This experience radically changed my attitude toward cloud sync solutions. They’re viable for casual document sharing and collaboration where file integrity isn’t mission-critical. But in any workflow deeply reliant on structured versioning, automation logic, or hash-based validity checks, they introduce more liabilities than they solve.

Some final recommendations from this debacle:

Keep sync and versioning separated. Never allow cloud sync software to act as your version control system.
Use transactional solutions for critical systems. Git, Mercurial, or Fossil—all designed to uphold data integrity under concurrency.
Disable auto-merge if possible. Favor alerts and pauses over silent overwrites.
Track hash history, not just timestamps. Checksums are far more reliable, especially across geographic timezones and remotes.

Ultimately, what broke was not just file integrity—but the illusion that sync meant safety. In a distributed world, control isn’t shared—it’s distributed. And unless uncertainty is explicitly resolved, it only propagates.