mirror of https://github.com/lobehub/lobe-chat.git synced 2026-06-13 19:20:04 +00:00

Files

T

History

Arvin Xu ddb5794826 chore: clean up LOBE-XXX code annotations (#15135 )

* chore: clean up LOBE-XXX annotations from codebase comments

- Remove 【LOBE-XXX】 bracket markers
- Remove LOBE-XXXX references from inline comments
- Clean up test descriptions containing LOBE identifiers
- Preserve linear.app URLs and code-level regex patterns
- Generated: 2026-05-23 02:30:09

* 🐛 fix(tests): restore () in arrow callbacks broken by annotation cleanup

The LOBE-XXX annotation cleanup script over-matched `(LOBE-XXXX', () =>`
and stripped the callback `()`, leaving invalid syntax like
`describe(..., => {` and `it(..., async => {` across 24 test files.

This caused parse failures in Test Packages, Test Desktop App, Test
Database lint, and Test App shard runs. Restoring `()` / `async ()`
unblocks the suites while keeping the ticket-text cleanup intact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* 🐛 fix(hintFormat-test): restore label + ellipsis in stripMarkdownLinks fixture

The annotation cleanup stripped `LOBE-8516` from a markdown-link's
*label* (`[LOBE-8516](/task/T-1)` → `[](/task/T-1)`), which then survived
`stripMarkdownLinks` because the pattern requires non-empty link text —
the test expected the link to disappear and asserted equality on a
LOBE-free output. The same line also lost a `.` from the trailing
`...` indicator in both input and expected strings.

Substitute a neutral Chinese label (`发布计划`) so the link continues
to exercise the multi-link substitution path, and restore the full
`...` ellipsis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Arvin Xu <arvinxx@lobehub.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-23 17:18:18 +08:00

src

chore: clean up LOBE-XXX code annotations (#15135 )

2026-05-23 17:18:18 +08:00

package.json

🐛 fix(web-crawler): support Jina CN domains (#14916 )

2026-05-22 23:05:27 +08:00

README.md

📝 docs: Polishing and improving product documentation (#12612 )

2026-03-03 16:01:41 +08:00

README.zh-CN.md

📝 docs: Polishing and improving product documentation (#12612 )

2026-03-03 16:01:41 +08:00

tsconfig.json

👷 build: nodejs 24 (#10003 )

2025-11-03 12:56:15 +08:00

vitest.config.mts

✅ test: refactor to improve utils tests and add more tests (#9124 )

2025-09-06 12:21:58 +08:00

README.md

@lobechat/web-crawler

LobeHub's built-in web crawling module for intelligent extraction of web content and conversion to Markdown format.

📝 Introduction

@lobechat/web-crawler is a core component of LobeHub responsible for intelligent web content crawling and processing. It extracts valuable content from various webpages, filters out distracting elements, and generates structured Markdown text.

🛠️ Core Features

Intelligent Content Extraction: Identifies main content based on Mozilla Readability algorithm
Multi-level Crawling Strategy: Supports multiple crawling implementations including basic crawling, Jina, Search1API, and Browserless rendering
Custom URL Rules: Handles specific website crawling logic through a flexible rule system

🤝 Contribution

Web structures are diverse and complex. We welcome community contributions for specific website crawling rules. You can participate in improvements through:

How to Contribute URL Rules

Add new rules to the urlRules.ts file
Rule example:

// Example: handling specific websites
const url = [
  // ... other URL matching rules
  {
    // URL matching pattern, supports regex
    urlPattern: 'https://example.com/articles/(.*)',

    // Optional: URL transformation, redirects to an easier-to-crawl version
    urlTransform: 'https://example.com/print/$1',

    // Optional: specify crawling implementation, supports 'naive', 'jina', 'search1api', and 'browserless'
    impls: ['naive', 'jina', 'search1api', 'browserless'],

    // Optional: content filtering configuration
    filterOptions: {
      // Whether to enable Readability algorithm for filtering distracting elements
      enableReadability: true,
      // Whether to convert to plain text
      pureText: false,
    },
  },
];

Rule Submission Process

Fork the LobeHub repository
Add or modify URL rules
Submit a Pull Request describing:

Target website characteristics
Problems solved by the rule
Test cases (example URLs)

📌 Note

This is an internal module of LobeHub ("private": true), designed specifically for LobeHub and not published as a standalone package.