This commit is contained in:
Anthony Lacruz
2025-06-22 10:32:44 +08:00
committed by GitHub

View File

@@ -77,7 +77,7 @@ Handy conversion guide:
### Use case: Service crawls a list of urls ### Use case: Service crawls a list of urls
We'll assume we have an initial list of `links_to_crawl` ranked initially based on overall site popularity. If this is not a reasonable assumption, we can seed the crawler with popular sites that link to outside content such as [Yahoo](https://www.yahoo.com/), [DMOZ](http://www.dmoz.org/), etc. We'll assume we have an initial list of `links_to_crawl` ranked initially based on overall site popularity. If this is not a reasonable assumption, we can seed the crawler with popular sites that link to outside content such as [Yahoo](https://www.yahoo.com/), [DMOZ](https://dmoz-odp.org/), etc.
We'll use a table `crawled_links` to store processed links and their page signatures. We'll use a table `crawled_links` to store processed links and their page signatures.