Skip to main content

Random File Corruption with MMDComposer5 using Markdown from ChatGPT Website

Completed

Comments

5 comments

  • Paul Solt

    Usually I'm trying to edit the bullet points and trying to remove the copious amounts of bold text from ChatGPT. That starts messing up my bullets, and sometimes bullets get corrupted with `***`.

    Sample markdown (Unsure if this will corrupt)

    Let’s break it down in plain terms — this is exactly how Carbon Copy Cloner (CCC) and your backup sizes, timing, and encryption work.
    
    ---
    
    ## 🧮 1. Drive Size Rule
    
    Your backup drive must be **at least as large as the used space** on your Mac — ideally **equal or larger than your internal drive**.
    
    | Internal SSD      | Recommended Clone Drive                       |
    | ----------------- | --------------------------------------------- |
    | 1TB               | 1TB (works fine) or 2TB (more breathing room) |
    | 2TB               | 2TB or 4TB (future-proof)                     |
    | 8TB (future plan) | 8TB backup (when you upgrade)                 |
    
    If you get a **4TB external** now for a **2TB internal**, that’s perfect. It’ll handle growth and lets you keep multiple snapshots if desired.
    
    You don’t need two drives right away — you can start with one and later rotate a second for off-site safety.
    
    ---
    
    ## ⚡️ 2. Clone Speed and Interruption Behavior
    
    ### First clone:
    
    * **Duration:** 1–3 hours for 2TB (depends on drive speed — SSD over Thunderbolt is fastest).
    * **Use impact:** You *can* keep working. CCC runs in the background.
    * **If you unplug or shut down:** It just pauses or fails gracefully. You can resume later.
    
    ### After first clone:
    
    * Incremental updates are **very fast** — often just a few minutes, because only changed files are copied.
    
    You can set it to run **at night** or on **connect**, e.g.:
    
    > “When this drive is connected, update the clone automatically.”
    
    ---
    
    ## 🔒 3. Encryption Behavior
    
    Your **internal drive is already encrypted** if FileVault is turned on
    (System Settings → Privacy & Security → FileVault).
    
    CCC doesn’t automatically carry over that encryption — **you decide when formatting the destination drive**:
    
    ### Two choices:
    
    1. **APFS (Encrypted)** → Secure, requires a password to unlock.
    
       * You set this once in Disk Utility.
       * CCC just clones into it.
       * You must unlock it before use (can auto-unlock if you store the password in your keychain).
       * Strongest privacy — if stolen, it’s unreadable.
    
    2. **APFS (not encrypted)** → Simpler, faster, less secure.
    
       * Anyone with the drive can read data.
       * Fine for a first simple setup.
    
    ✅ **Start simple:** format as plain APFS first → confirm backups work → later, reformat as APFS (Encrypted) once you’re comfortable.
    
    If encrypted, the password is just for that drive — it doesn’t reuse your Mac login password automatically.
    
    ---
    
    ## 🕹 4. Time Machine in 2025
    
    Yes, **Time Machine is still supported** and stable on macOS Sonoma / Sequoia:
    
    * Works best with **APFS** drives now (not HFS+).
    * Performs hourly incremental backups.
    * You can exclude large media folders.
    * Pairs perfectly with a NAS (Synology is great for this).
    
    However:
    
    * It can occasionally hang or fail if your NAS goes offline or permissions change.
    * It’s slower for full restores, but great for recovering single files.
    
    So CCC = “fast full restore.”
    Time Machine = “everyday version control.”
    
    ---
    
    ## 🧱 5. Best Starting Plan for You
    
    ### Step 1 — Prep drive
    
    * Plug in a 4TB SSD.
    * Open **Disk Utility** → Erase → Format: `APFS` (or `APFS (Encrypted)` if ready).
    
    ### Step 2 — Create CCC task
    
    * Choose your internal drive as **Source**.
    * Choose your SSD as **Destination**.
    * Check **SafetyNet off** (or on if you want versioning).
    * Schedule: “Run when drive is connected.”
    * Done. CCC will keep it synced automatically.
    
    ### Step 3 — (Optional later)
    
    * Add a **second drive** → store in your firebox, rotate monthly.
    
    ---
    
    ## 🧩 6. Future Workflow
    
    You’ll eventually have this setup:
    
    | Type                     | Tool               | Drive           | Frequency      | Notes                                |
    | ------------------------ | ------------------ | --------------- | -------------- | ------------------------------------ |
    | System clone             | Carbon Copy Cloner | External SSD #1 | Weekly auto    | Bootable, encrypted later            |
    | Offline safety           | Carbon Copy Cloner | External SSD #2 | Monthly manual | Stored in firebox                    |
    | Incremental file history | Time Machine       | NAS             | Continuous     | File-level restore                   |
    | Raw video archive        | External HDDs      | Labeled by year | Manual         | Quiet storage, disconnected normally |
    
    ---
    
    If you want, I can give you the **exact Disk Utility + CCC setup steps** (what buttons to press, settings to toggle, and what to expect during the first run).
    Would you like that as your next step?
    
    0
  • Fletcher Penney

    Paul,

    I don't see anything immediately wrong in the syntax highlighting from that sample.  And other than the (IMO) odd use of emojis in the headers, I don't see anything that jumps out as “wrong” in the syntax.  But maybe I just skimmed over it??

    If it happens again, could you try a screen recording and using “Undo” to try and backtrack?  That might help give me a better idea as to what is happening.

    I haven't seen anything like what you're describing, so I don't have any immediate ideas as to what is happening unfortunately…

     

    Thank you!!

    0
  • Paul Solt

    Ok, I can easily reproduce this. it's related to the indentation of the file. I try to fix the indentation. I might be inserting tabs (not spaces). Not sure. and it gets all haywire. Undo is unable to fix the problems. It stays corrupted after the incorrect file contents are deleted or modified.

    Video
    https://youtu.be/Wk7yrA_6TpQ



    Sample Text:

    1. Edit this to remove top/bottom gibberish.
    2. Try to fix the indentation of the bullets. 

    Actual
    It starts inserting ** in the list (with tabs?) Then formatting gets wonky, then editing starts editing in the wrong locations.

    Perfect — you’ve cleared the space and stabilized the commit workflow, which unlocks your core engineering loop.
    Here’s your **updated, trimmed-down task list** — focused only on what remains relevant now.
    
    ---
    
    ## 🧭 Current Focus (as of now)
    
    ### 🔥 1. Unified Detector Harness
    
    * Finalize the pipeline that runs on:
    
      * **Single photo**
      * **Photo set**
      * **Video frames**
    * Confirm frame-by-frame inference consistency (no async drift).
    * Integrate directly with **Arachna Detector Mac app** (you already have the overlay visualization).
    * Package as an **SPM module** for reuse in:
    
      * SpiderApp (TestFlight)
      * CLI utilities
    * Ensure commit workflow stays enforced (only `git status`, `git diff`, and `commit.sh` allowed).
    
    ---
    
    ### ⚙️ 2. Square Crop Logic (In Progress)
    
    * Implement consistent square cropping for training + inference.
    * **Logic:**
    
      * Center crop using **long edge**; if it exceeds bounds, use **short edge** fallback.
      * Clamp coordinates to image bounds.
      * Apply uniformly across training export, inference, and preview.
    * Propagate to:
    
      * Training crop exporter
      * Classifier inference path
      * Detector debugging pipeline
    * Test on edge cases (extreme aspect ratios, partial spiders near image borders).
    
    ---
    
    ### 🧩 3. Classifier Retraining (Post-Harness)
    
    * Use current best quality-gated crops.
    * Train **v18 classifier** with new square crops.
    * Validate reduction in false positives (esp. blurred recluse / false widow).
    * Integrate into detector harness → test live performance.
    
    ---
    
    ### 🗺️ 4. Geo-Aware Dataset Filtering
    
    * Add geolocation to Postgres (PostGIS or lat/long columns).
    * Prune non-US species to reduce model confusion.
    * Update export commands to filter by region before training.
    
    ---
    
    ### 🧠 5. Training Workflow Documentation
    
    * Continue building the **`training/` Git folder**:
    
      * `TRAINING_NOTES.md` — versioned command recipes.
      * `scripts/` — reproducible CLI commands (dataset exports, crops, training).
      * `.gitignore` — exclude all media; commit only scripts + notes.
    * Optional: store square-crop parameters and seed values with each model version.
    * Goal: every model (e.g., `classifier_v18`, `detector_v3`) has a matching **command lineage** recorded in Git.
    
    ---
    
    ### 🧘 6. Quality Work (Background)
    
    * Tag more low-quality photos when you have downtime.
    * Incrementally improve Quality v4 → v5 later.
    
    ---
    
    ### ✅ Active Priorities Order
    
    | Priority | Focus                                       |
    | -------- | ------------------------------------------- |
    | 1️⃣      | Unified Detector Harness (core pipeline)    |
    | 2️⃣      | Square Crop Logic (integrated into harness) |
    | 3️⃣      | Classifier Retraining w/ Square Crops       |
    | 4️⃣      | Geo-Aware Dataset Filtering                 |
    | 5️⃣      | Training Notes & Scripts in Git             |
    | 6️⃣      | Quality Tagging (low-energy filler work)    |
    
    ---
    
    Would you like me to generate a compact Markdown checklist version (`SpiderApp_Tasks.md`) you can drop into your repo for tracking this sprint?
    
    
    0
  • Fletcher Penney

    Paul,

     

    Thanks for that example.  With that, I was able to reproduce the issue.  The problem is the improperly indented list.  For example, the two leading spaces in front of `* **Single photo**`.  The improper list is messing up my list cleanup algorithm, so I need to figure out how to make that more resilient.

     

    In the meantime, highlight the entire list (including through `Ensure commit workflow…`) and Cmd-[ in order to shift the lines to the left (thereby removing the leading whitespace.). Then you can indent individual lines as desired.  

     

    You would have to do this for each list.  Or you could select the entire document for this in a single step, if you do not have any leading whitespace that is not part of a list (e.g. indented code blocks).

     

     

    Thanks again!!

    0
  • Fletcher Penney

    A new build has been pushed that should fix this.

     

    Thanks again!

    0

Please sign in to leave a comment.

Powered by Zendesk