[
Skip Navigation]
≡
β©οΈ
π£οΈ
-
π
Help
:
Wiki
:
Disallowed and Sites With Quotas
≡
Welcome
Signin
Disallowed and Sites With Quotas@Help
View
Source
History
Discussion
Help Group
Create/Find Pages
Group Feed
My Groups
π
Locale: en-US
Page: Disallowed and Sites With Quotas
β
ποΈ
Page Type:
Standard
Markdown
Page and Feedback
Page Alias
Media List
Presentation
Url Shortener
Share Wall
Alias Page To:
Page Border:
Solid
Dashed
None
Table of Contents:
Title:
Author:
Meta Robots:
Meta Description:
Meta Properties (such as Open Graph)
One line per property in format: name|content
Header Page Name:
Footer Page Name:
'''Disallowed to Crawl Sites''' are urls or domains (listed one-per-line) that Yioop should not crawl. <br /> A line like: <pre> http://www.somewhere.com/foo/ </pre> would disallow the url <pre> http://www.somewhere.com/foo/goo.jpg </pre> to be crawled. <br /> A line like: <pre> domain:foo.com </pre> would disallow the url <pre> http://a.b.c.foo.com/blah/ </pre> to be crawled. <br /> It is also possible to disallow a site using a regular expression: <pre> regex:/foo\d+/ </pre> would disallow any url containing the string "foo" followed by 1 or more digits. <br /> '''Sites with Quotas''' are urls or domains that Yioop should at most crawl some fixed number of urls from in an hour. These are listed in the same text area as Disallowed to Crawl Sites. To indicate the quota one lists after the url a fragment #some_number. For example, <pre> http://www.yelp.com/#100 </pre> would restrict crawling of urls from Yelp to 100/hour.
X