DocHunt: A Multi-Language Hack to Migrate Kipwise Documentation

15 December 2024

From locked-In to lightweight: A programmatic migration from Kipwise to Markdown that saved Recursion $15K/Year

In the ever-evolving landscape of software development, maintaining documentation is crucial, but the tools we use to manage it can become outdated or expensive. In this post, I’d like to share how we successfully migrated thousands of documents from Kipwise to plain Markdown using a custom-built solution, and in the process, saved Recursion approximately $15,000 USD per year! 💰

The Challenge

After switching to Swimm as Recursion’s primary documentation platform, we found nearly 2,000 documents in 466 subdirectories still residing in Kipwise, at an annual cost of approximately $15,000 USD. The challenge was clear: we needed to migrate all this content to a simpler, more cost-effective solution while maintaining the hierarchical structure and formatting of our documentation pages.

While Kipwise served us well for a time, it came with a hefty annual bill and had started to feel like a closed box-limited in customizability and difficult to version control. To make things trickier, Kipwise didn’t seem to expose a public REST API. At least not one we could find.

This unique situation raised a key question: Could we migrate these documents to plain Markdown and integrate them with our internal docs, even without a visible API in Kipwise? 🥁
Turns out, we could. 😎

The Solution: A Multi-Language Approach

To handle the migration, I built DocHunt: a set of scripts combining JavaScript, Bash, and Python.

Let’s break it down.

1. Expanding the Tree with expand.js

We started in the browser. I discovered that Kipwise actually has a REST API under the hood, which I was able to find after some wizardry with the Inspect tool. I also noticed that Kipwise lazily loads its document tree, so we had to ensure all nodes were expanded before we could scrape the URLs.

Expand Tree Nodes JavaScript Code


function expandTreeNodes() {
    const nodesToClick = document.querySelectorAll('span.cOnmID.Icon--caret-right--2jLJYkxd');
    if (nodesToClick.length === 0) return console.log("All nodes expanded");

    nodesToClick.forEach(node => {
        let ancestor = node;
        for (let i = 0; i < 3; i++) ancestor = ancestor.parentElement;
        console.log("Expanding:", ancestor.innerText.trim());
        node.click();
    });
    setTimeout(expandTreeNodes, 100); // loop until all are expanded
}
expandTreeNodes();

I ran this in the browser console while viewing the Kipwise docs tree. With this step I expand all nodes in the Kipwise UI tree and make sure no documents are left hidden.

2. Extracting Document URLs

Once everything was visible, I ran another script to extract all document URLs and metadata from the Document Object Model (DOM). The DOM is a programming interface used by web browsers to represent and interact with the structure of an HTML or XML document. Since it exposes the content of the page, such as elements, attributes, and text, as a tree of objects that JavaScript can access and manipulate, we took advantage of it. Eventually, we discovered that the API endpoint used to retrieve individual documents was:
https://webapi.kipwise.com/1.0/documents/title.

Extract document URL


const links = document.querySelectorAll('a[href]');
const results = [];

links.forEach(link => {
    const folderId = link.getAttribute('folderid');
    const href = link.getAttribute('href');
    const match = href?.match(/\/contents\/([a-f0-9\-]{36})/);

    if (folderId && match) {
        const title = link.innerText.trim();
        const url = `https://webapi.kipwise.com/1.0/documents/${match[1]}/`;
        results.push(`${folderId} ${url} ${title}`);
    }
});
console.log(results.join('\n'));

Here, I am scrapping titles and URLs while preserves the directory hierarchy. As a result, we get a raw list of folder UUIDs, document URLs, and titles.

3. Downloading JSON

Now that we have the doc page URLs, I used the curl command to fetch the content of each doc page via Kipwise’s undocumented internal API:

Curl and get the .json!


#!/bin/bash
input_file="raw_output.txt"
output_dir="./output_json/"
mkdir -p "$output_dir"

while read -r line; do
  url=$(echo "$line" | awk '{print $2}')
  output_file="$output_dir/$(basename "$url").json"
  echo "Fetching: $url"
  curl "$url" \
    -H "x-kip-token: MY_TOKEN" \
    -H "x-team-id: TEAM_ID" \
    -H "Accept: application/json" > "$output_file"
  sleep 2
done < "$input_file"

By fetching the JSON files associated with each do page and temporarily saving to disk, we preserve the original structure.

4. Converting JSON to Markdown

The Kipwise API returns deeply nested JSON, so I wrote a converter in Python that handles headings, bold text, paragraphs, lists, and even links:

Convert .json to .md


def format_text(leaves):
    return "".join("**" + l["text"] + "**" if "marks" in l and any(m["type"] == "strong" for m in l["marks"]) else l["text"] for l in leaves)

def format_markdown(node):
    t = node.get("type", "")
    if t == "title": return f"# {format_text(node['nodes'][0]['leaves'])}"
    if t == "heading-two": return f"## {format_text(node['nodes'][0]['leaves'])}"
    if t == "paragraph":
        return "".join(format_text(c["leaves"]) if c["object"] == "text" else f"[{format_text(c['nodes'][0]['leaves'])}]({c['data']['href']})" for c in node.get("nodes", []))
    if t == "list-item": return f"- {''.join(format_markdown(c) for c in node['nodes'])}\n"
    # Fallback
    return "".join(format_markdown(child) for child in node.get("nodes", []))

This step converts rich text blocks to Markdown and handles links, headings, bullets, and special characters. Very important in our docs!

5. Batch Conversion

Since we had a lot of files, we needed a way to automate the conversion. This script batch-processed everything and logged any errors for review. It’s essentially a wrapper that ran the converter on all JSON files:

Convert .json to .md


for filename in os.listdir(input_dir):
    if filename.endswith('.json'):
        subprocess.run(
            ['python', 'json_to_markdown.py'],
            stdin=open(os.path.join(input_dir, filename), 'r'),
            stdout=open(os.path.join(output_dir, filename.replace('.json', '.md')), 'w')
        )

It was key to report failures and inconsistencies in this step!

6. Rebuilding the Folder Structure

Finally, we rebuilt the folder hierarchy using the original Kipwise structure stored in a JSON mapping. Each file was renamed and moved accordingly:

def clean_filename(name):
    return re.sub(r'[^\w\-_.]', '', name.replace(' ', '_'))

def create_directory_structure(...):
    # Load folder UUID mappings
    with open(folders_json) as f:
        folders = json.load(f)
    ...
    for line in open(raw_output_file):
        folder_uuid, url, title = parse_line(line)
        folder_path = folders.get(folder_uuid)
        dest_dir = os.path.join(full_output_dir, *map(clean_filename, folder_path))
        os.makedirs(dest_dir, exist_ok=True)

        uuid = extract_uuid(url)
        src_file = os.path.join(output_base_dir, f"{uuid}.md")
        dest_file = os.path.join(dest_dir, f"{clean_filename(title)}.md")
        shutil.copyfile(src_file, dest_file)

Results

📄 Migrated: 1,978 files
📁 Subdirectories preserved: 466
💾 Final size: ~11MB of Markdown
💰 Cost savings: $15,000/year

We ended up with a new, Git-friendly documentation system organized by topics like development, Discovery Platform, among many others.

Lessons Learned

Don’t fear migration projects! They can be fun and impactful 📈
Start simple, then scale. ⬆️
Having logs at every step saved us during debugging! 🐛
Markdown is a surprisingly powerful yet simple documentation format. 📃
Cost-saving side projects can build momentum and trust! 💰

Conclusion

This project started as a cost-saving initiative but ended up delivering much more: autonomy over our documentation, better version control, and simpler backups. By stitching together browser scripts, shell tools, and Python logic, we built a hacky but effective solution to take our knowledge back from a locked-down SaaS and into our own version-controlled hands.

If you’re stuck with costly platforms for something as fundamental as documentation, it might be time to plan your own hunt! 🦆

Le Miroir