Skip to content

redesign: replace docs with new IA from Pixee-Marketing-OS PR #117#256

Open
dhafley wants to merge 7 commits intomainfrom
redesign/v2-content
Open

redesign: replace docs with new IA from Pixee-Marketing-OS PR #117#256
dhafley wants to merge 7 commits intomainfrom
redesign/v2-content

Conversation

@dhafley
Copy link
Copy Markdown
Contributor

@dhafley dhafley commented May 5, 2026

Summary

Replaces the existing ~9-page docs.pixee.ai with the 71-page redesigned IA from Pixee-Marketing-OS PR #117 (merged 2026-04-28).

  • 71 new pages across 10 sections (api, configuration, enterprise, faq, getting-started, how-it-works, integrations, languages, open-source, platform). Frontmatter dedup'd, track field normalized to lowercase, numeric prefixes dropped, meta_description renamed to description, sidebar_position injected.
  • Welcome page promoted to /. Pre-existing React landing (src/pages/index.js + HomepageFeatures component) deleted. Sidebar's "Getting Started" category now lands at root.
  • /integrations/contrast authored from scratch (PR Update canonical URL #117 dropped Contrast from the IA; we kept it because it's still in the new sidebar). Drafted in the new docs voice from public Contrast Security docs + the existing 4-line stub. Worth a careful read before merge.
  • 18 redirect rules in docusaurus.config.js map every old URL to its closest new equivalent. Existing /integrations/* → /code-scanning-tools/* redirects flipped to point the new direction.
  • SEO additions (config-only, no React components):
    • Site-wide Organization JSON-LD via headTags
    • docusaurus-plugin-llms generates llms.txt + llms-full.txt at build
    • static/robots.txt explicitly allows GPTBot, ClaudeBot, PerplexityBot, Google-Extended, etc.
  • migration/ archive at repo root contains migrate.py, fixup_links.py, ASSESSMENT.md, and README.md — historical record only. Do not re-run.

Deferred to v2 (per scope agreement, not in this PR): in-page audience badge, <SchemaOrg> per-page JSON-LD (FAQPage / HowTo), raw-.md alternates for AI agents, Algolia DocSearch, HubSpot lead capture, GA4 custom events.

What to review

Priority What
🔴 1 docs/integrations/contrast.md — wholly new content, needs technical accuracy check
🟡 2 docusaurus.config.js redirects — confirm /running_on_public_github_repos → /configuration/repositories is the right target, otherwise we should change to /getting-started/github
🟡 3 Sidebar order and labels — eyeball in dev preview
🟢 4 Spot-check 5 random pages for tone / accuracy
🟢 5 migration/ASSESSMENT.md — captures all decisions and tradeoffs

Test plan

  • yarn build clean — 72 docs processed, zero broken links
  • yarn serve — all 72 page slugs return 200, all 10 category landings render with correct titles
  • Redirects emit meta-refresh + canonical (/intro → /, /code-scanning-tools/sonar → /integrations/sonarqube, /faqs → /faq/general, etc.)
  • Sitemap, robots.txt, llms.txt, Organization JSON-LD verified in build output
  • Reviewer confirms Contrast page is technically accurate
  • Reviewer confirms /running_on_public_github_repos redirect target is correct
  • CI build passes on this PR

🤖 Generated with Claude Code

dhafley and others added 3 commits May 5, 2026 14:05
Migrates 71 markdown pages from Pixee-Marketing-OS/10_execute_short_term/pixee_docs/
into a new 10-section information architecture, replacing the previous ~9 thin pages.

Content changes:
- 71 new pages across 10 sections: getting-started, platform, how-it-works,
  integrations, configuration, enterprise, languages, api, open-source, faq.
- New /integrations/contrast page authored to preserve coverage from the old
  /code-scanning-tools/contrast (PR #117 had no Contrast equivalent).
- Removes 9 stale top-level pages and the /code-scanning-tools/ section.
- Removes leftover Docusaurus demo src/pages/markdown-page.md.

Sidebar:
- Autogen sidebar with per-section _category_.json files providing track badges
  ([DEV] / [LEADER] / [BOTH]) in category labels and curated section ordering.
- Each section's overview page is set as the category landing via link.id.
- how-it-works uses a generated-index card list.

Frontmatter normalization (handled in migration):
- Numeric file prefixes dropped, replaced with sidebar_position in frontmatter.
- track field case normalized to lowercase across 71 files.
- Duplicate frontmatter keys deduped in 29 files (last-wins).
- meta_description renamed to Docusaurus-standard description field.

Redirects (docusaurus.config.js plugin-client-redirects, 17 rules):
- All old top-level URLs map to closest new equivalents.
- All /code-scanning-tools/* URLs map to new /integrations/* equivalents.
- Pre-existing /integrations alias rules updated to point at new IA.

SEO additions (config-only, no React components added):
- Site-wide Organization JSON-LD via headTags in docusaurus.config.js.
- docusaurus-plugin-llms generates llms.txt and llms-full.txt at build.
- static/robots.txt explicitly allows GPTBot, ClaudeBot, PerplexityBot,
  Google-Extended, Applebot-Extended, CCBot, OAI-SearchBot.

Verification:
- yarn build clean (72 docs processed, zero broken links).
- All 72 page slugs return 200 against yarn serve.
- Sidebar order matches the spec; track badges visible on category labels.
- Redirects emit correct meta-refresh + canonical link in build output.

Deferred to v2 (per scope agreement):
- React components: AudienceBadge, SchemaOrg, FeedbackWidget.
- Per-page FAQPage / HowTo JSON-LD.
- Raw .md alternates for AI agents.
- Algolia DocSearch and HubSpot lead capture.

Note: src/pages/index.js (the PixeeDocs hero landing) is unchanged in this PR.
The new welcome page lives at /getting-started; whether to redirect / -> /getting-started
or rework the React landing is a v2 decision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the pre-existing PixeeDocs React landing (src/pages/index.js +
HomepageFeatures component) and promotes the welcome page to the site root
by setting slug: / on docs/getting-started/getting-started.md.

The "Getting Started" sidebar category now lands at / via its existing
link.id reference; the welcome page's subpaths (/getting-started/github,
/getting-started/gitlab, etc.) remain unchanged and still resolve.

Redirect updates in docusaurus.config.js:
- /intro, /installing, /supported-scms now point to / (was /getting-started)
- New rule: /getting-started -> / (catches stale links and old shares)

Body content updates: 5 internal markdown links from `](/getting-started)`
rewritten to `](/)` so navigation goes directly to the welcome page rather
than hitting the redirect chain.

Verification: yarn build clean. / returns "Welcome to Pixee" title; all
10 sidebar track-badged categories still render; spot-checked content pages
return 200; legacy URLs redirect correctly via meta-refresh.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the [DEV] / [LEADER] / [BOTH] suffixes from sidebar category
labels. The track field stays in page frontmatter for potential v2 use
(in-page audience badge), but the sidebar reads cleaner without them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dhafley dhafley requested review from daharmattan1 and sip49 May 5, 2026 18:26
Adds migration/ at the repo root as a historical record of the 2026-05-05
docs redesign. Contents:

- ASSESSMENT.md — planning and decision log, including the three-repo deploy
  flow, redirect table, SEO additions, and what was actually executed.
- migrate.py — one-shot Python script that ported PR #117 content into
  docs/docs/, normalized frontmatter, dropped numeric prefixes, generated
  _category_.json files.
- fixup_links.py — one-shot link-fixup pass that fixed 27 internal markdown
  links across 9 files after migrate.py.
- README.md — orientation for future readers, plus a clear DO-NOT-RE-RUN
  warning (migrate.py would wipe the manually-authored Contrast page and
  revert the welcome doc's slug: /).

Lives at the repo root rather than docs/migration/ so Docusaurus does not
treat these files as published pages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dhafley dhafley force-pushed the redesign/v2-content branch from b176d34 to 0c0b51f Compare May 5, 2026 18:27
Restructures the flat /integrations/<x> layout into two clean
subcategories that match the way the integrations actually divide:
SCM platforms (where Pixee delivers fixes) and scanning tools (where
findings come from). Moves URLs from /integrations/<x> to
/integrations/{scms,scanners}/<x> and adds redirects.

SCMs (4 pages) under docs/integrations/scms/:
- github.md       (renamed from github-platform.md, content unchanged)
- gitlab.md       (split out from scm-platform-reference.md)
- azure-devops.md (split out from scm-platform-reference.md)
- bitbucket.md    (split out from scm-platform-reference.md)

Scanners (14 pages) under docs/integrations/scanners/:
- appscan.md, checkmarx.md, codeql.md, contrast.md, gitlab-sast.md,
  semgrep.md, snyk-code.md, sonarqube.md, veracode.md (moved from flat)
- polaris.md, fortify.md   (split out from commercial-scanners.md)
- trivy.md, defectdojo.md  (split out from oss-aggregator-scanners.md)
- gitlab-sca.md            (newly authored to match the SCA scope —
  this content was missing from PR #117 and needs colleague review)

Removes three consolidated wrapper pages now that each scanner / SCM
has its own page: commercial-scanners.md, oss-aggregator-scanners.md,
scm-platform-reference.md.

Sidebar (autogen, no hand-built):
- /integrations/overview        (sidebar_position: 1)
- /integrations/sarif-universal (sidebar_position: 2)
- Source Control subcategory    (position: 3, generated-index landing)
- Scanning Tools subcategory    (position: 4, generated-index landing)

Each subcategory gets a generated-index landing at /category/source-control
and /category/scanning-tools respectively, which renders a card list of the
pages inside.

Redirects (added to docusaurus.config.js):
- /integrations/<flat-scanner>      -> /integrations/scanners/<x>  (9 rules)
- /integrations/github              -> /integrations/scms/github
- /integrations/{commercial-scanners,oss-aggregator-scanners,scm-platforms}
                                    -> /integrations/overview
- Pre-existing /code-scanning-tools/* and /integrations/sonar redirects
  retargeted to the new /integrations/scanners/<x> URLs.

Body content: 2 internal links updated from /integrations/codeql to
/integrations/scanners/codeql in the new github.md page.

Overview rewrite: integrations-overview.md updated to reflect the new
two-category structure, refreshed coverage matrix (13 scanners), and
new SCM links pointing at /integrations/scms/<x>.

Migration archive: migration/integrations_restructure.py captures the
mechanical operations (file moves, frontmatter updates, body-link
fixup, wrapper deletions) for posterity. Will not be re-run.

Verification:
- yarn build clean (77 docs processed; was 72 before this commit).
- yarn serve verified all 4 SCM pages, all 14 scanner pages, both
  generated-index landings, 8 sample redirects, and sidebar order.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@daharmattan1 daharmattan1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: New IA Migration from PR #117

Reviewed across 7 dimensions: source fidelity, technical accuracy, redirect correctness, Docusaurus config, content tone, IA completeness, and migration archive.

Summary

Strong migration. All 71 source pages from PR #117 ported faithfully with correct frontmatter normalization. Redirects are comprehensive and all targets resolve. Content tone is clean — no marketing language leakage. The integrations restructure into scanners/ and scms/ subdirectories is a good IA improvement over the flat structure.

Blocking Issues

None.

Non-Blocking Issues

1. /running_on_public_github_repos redirect content gap (docusaurus.config.js)
The redirect to /configuration/repositories is functional but the target page doesn't cover the original content (step-by-step guide for running Pixee on public GitHub repos without tools). Neither does any other page in the new IA. Suggest adding a paragraph to /getting-started/github or the repositories config page covering this use case, or changing the redirect target to /getting-started/github which is a closer topical match.

Nits

2. Source sonarqube.md had duplicate frontmatter keys — PR correctly deduped title and slug that each appeared twice. Nice catch by the migration script. (No action needed, just noting.)

Dimension-by-Dimension Detail

Source Fidelity (5/5 pages sampled): fix-safety, security, agentic-security-engineering, sonarqube, enterprise-overview — all faithful. Body content identical. Frontmatter correctly normalized: meta_descriptiondescription, sidebar_position injected, track lowercased, duplicate keys deduped.

Technical Accuracy: Contrast page is well-structured, consistent with CodeQL and Semgrep integration pages. Tone is appropriate for docs.

Redirect Correctness (30+ rules validated): All redirect to targets confirmed to exist via slug frontmatter in the new file set. The expanded redirect set covers old top-level pages, code-scanning-tools/*integrations/scanners/*, and flat integrations/<name>integrations/scanners/<name> or integrations/scms/<name>. Comprehensive.

Docusaurus Config: Organization JSON-LD data looks correct. docusaurus-plugin-llms registered. _category_.json files sampled (integrations, how-it-works, platform) have correct labels, positions, and link references.

Content Tone (5 pages spot-checked): phased-rollout, faq-general, java, operations-config, commercial-scanners — all factual and neutral. No SEO keyword stuffing, no customer quotes, no JSON-LD in FAQ pages, no competitive comparison tables, no CTAs. Clean docs tone.

IA Completeness: Integrations restructured into scanners/ and scms/ subdirectories — good improvement. 5 new scanner pages added (DefectDojo, Fortify, GitLab SCA, Polaris, Trivy). Consolidated pages verified: operations-config.md covers scheduling + notifications + reporting.

Migration Archive: Located at repo root (migration/), not inside docs/. README has clear "Do not re-run" warnings with explanation of why scripts are destructive. ✅

Comment thread docusaurus.config.js
{ from: "/open-pixee", to: "/open-source/overview" },
{
to: "/code-scanning-tools/overview",
from: "/integrations",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The redirect works, but the original page had specific setup instructions for public repos without tools that don't exist anywhere in the new IA. Consider adding a paragraph to /getting-started/github covering this use case, or retargeting to /getting-started/github as a closer match.

@dhafley
Copy link
Copy Markdown
Contributor Author

dhafley commented May 7, 2026

Thanks for the close read across all 7 dimensions, Victor.

On the non-blocking item: agreed — the /running_on_public_github_repos redirect target was a wrong-but-plausible pick at migration time, and the new IA is missing the public-repo-from-zero onboarding flow that the old page covered.

Pushing a follow-up commit that:

  1. Adds a "Public Repositories Without an Existing Scanner" section to /getting-started/github covering the original content's substance (enable GitHub Issues for the dashboard, pick a free scanner — CodeQL via GHAS or SonarQube Cloud — install Pixeebot, what to expect), re-toned to match the new docs voice.
  2. Retargets the redirect to /getting-started/github so old inbound links land on coverage of the original use case.

Will follow up here once the commit is in. The dedup nit needs no action — noted, thanks.

Addresses Victor's review feedback on PR #256.

The pre-migration site had a /running_on_public_github_repos page that
walked new users through setting up Pixee on a public GitHub repo with
no existing scanner: enable Issues for the dashboard, pick a free-tier
scanner (CodeQL via GHAS or SonarQube Cloud), install Pixeebot. The
initial migration redirected that URL to /configuration/repositories,
which is functional as a redirect but does not actually cover the
original use case.

This commit:

1. Adds a "Public Repositories Without an Existing Scanner" section to
   docs/getting-started/github.md covering the three steps (enable
   Issues, connect a free scanner, install Pixeebot), re-toned to
   match the new docs voice. Cross-links to the CodeQL and SonarQube
   scanner integration pages for deeper detail.
2. Retargets the redirect: /running_on_public_github_repos now points
   at /getting-started/github (was /configuration/repositories).

Verification: yarn build clean. Redirect HTML correctly points at the
new target. New section renders in the production build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dhafley
Copy link
Copy Markdown
Contributor Author

dhafley commented May 7, 2026

Follow-up landed in 64f9100:

  • Added a "Public Repositories Without an Existing Scanner" section to docs/getting-started/github.md covering the original three-step flow (enable Issues for the dashboard, pick a free scanner — CodeQL via GHAS or SonarQube Cloud, install Pixeebot), with cross-links to the CodeQL and SonarQube scanner pages.
  • Retargeted the redirect: /running_on_public_github_repos/getting-started/github (was /configuration/repositories).

CI green. Ready for another look when you have a minute.

Two redundancies surfaced by an overlap audit (>=70% Jaccard on 5-word
shingles):

1. configuration/scheduling.md was a 90-line subset of the 249-line
   configuration/operations-config.md, both alive in the sidebar. The
   migration's operations-config consolidation was supposed to absorb
   scheduling but the standalone scheduling.md was never deleted.
   Removing it; the operations page covers everything it covered. Two
   inbound internal links (config-overview.md, sonarqube.md) repointed
   to /configuration/operations. Redirect added: /configuration/scheduling
   -> /configuration/operations.

2. platform/remediation.md and how-it-works/fix-safety.md shared three
   near-identical paragraphs about the independent fix evaluator (76%+
   Jaccard). fix-safety.md is the canonical home (technical guide with
   the three-dimension rubric); the leader-track remediation page does
   not need the full detail. Replacing the three paragraphs in
   remediation.md with a one-paragraph summary that links to fix-safety.

Verification: yarn build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants