cupertino-docs/docs/samplecode
Mihaela Mihaljevic 20c7ba9d80 data: merge Claw mini's 5.5-day crawl into the v1.1.0 corpus
Merge of Studio's v1.1.0 corpus (412,523 files) with Claw mini's
342,790-file crawl, per-file inspection across SHA-different + only-Claw
buckets:

  498 overwrites from Claw   (Claw's version had richer content; Apple
                              updated the page between the Studio crawl
                              in early May and the Claw crawl in mid-May)
2,285 new files from Claw    (URLs Claw discovered that Studio missed,
                              mostly hash-suffixed Swift overload-
                              disambig pages)
  153 dropped at boundary    (Claw-only URLs whose content was poison —
                              JS-fallback / React 404 sub-view — filtered
                              out before commit)

Studio's existing 412,523 files plus 2,285 new from Claw = 414,807 files
total. Schema unchanged from v1.1.0. Post-merge 13-category poison audit
(414,807 files): 0 matches across all categories.

Companion writeup: blog draft '2026-05-14-merging-two-apple-doc-crawls.md'.
Source-corpus state at this commit is what cupertino-docs@v1.1.1 will tag.
2026-05-14 18:49:44 +02:00
..
documentation_samplecode.json data: merge Claw mini's 5.5-day crawl into the v1.1.0 corpus 2026-05-14 18:49:44 +02:00