<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Screen Scraping</title>
	<atom:link href="http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/</link>
	<description>Musings on just about everything.</description>
	<lastBuildDate>Wed, 10 Mar 2010 07:01:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: J</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-131745</link>
		<dc:creator>J</dc:creator>
		<pubDate>Mon, 20 Jul 2009 16:10:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-131745</guid>
		<description>We use biterscripting for web scraping our own web pages. There is some good sample code at http://www.biterscripting.com/samples_internet.html , if any one is interested.

J</description>
		<content:encoded><![CDATA[<p>We use biterscripting for web scraping our own web pages. There is some good sample code at <a href="http://www.biterscripting.com/samples_internet.html" rel="nofollow">http://www.biterscripting.com/samples_internet.html</a> , if any one is interested.</p>
<p>J</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mitchell</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-405</link>
		<dc:creator>Mitchell</dc:creator>
		<pubDate>Tue, 12 Dec 2006 03:19:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-405</guid>
		<description>Mark - 1/30/2004 1:47:30 PM&lt;br&gt;&gt;&gt;Right on. And if you don&#039;t want people to steal your car, don&#039;t go leaving it in a parking lot.&lt;br&gt;  ------------------------&lt;br&gt;&lt;br&gt;  Your analogy is slightly incorrect.&lt;br&gt;&lt;br&gt;  Putting your files on a webserver without any form of access restriction, more closely resembles leaving your car in an airport or major metropolitan parking lot WITHOUT LOCKING THE CAR DOORS, while you go on vacation.  And if you don&#039;t want your car stolen, then you definately should NOT do that.&lt;br&gt;&lt;br&gt;  I run alot of tutorial and article directory sites so I deal with people scraping my content on an hourly basis.&lt;br&gt;&lt;br&gt;  When I put something on any of my webservers that I do not want being publicly used, or scraped, then I use any number of well-known methods to restrict access to the data.  Htaccess files would be one general example.&lt;br&gt;&lt;br&gt;  When I see what I know are certainly scraper bots (besides the search engines, who are generally the biggest resource hogs I deal with) I allow them to scrape anything to which I already allow public access (aka UNRESTRICTED data).  In the event that a particular bot begins to use too many resources, I simply ban the bot.  I choose to do this manually, but there are quite a few extremely good and free pieces of software that will ban overzealous bots with a great deal of automation.&lt;br&gt;&lt;br&gt;If you don&#039;t want unrestricted access to your data, you can &quot;lock your doors&quot;.  So, that being said...&lt;br&gt;&lt;br&gt;&quot;Greg - 1/6/2004 10:46:00 PM&lt;br&gt;If you dont want people to use your html, dont put it on a web server. You yanks are truly amazing&quot;&lt;br&gt;&lt;br&gt;And for those who still insist on having all their data publicly accessible but wish to continue to get upset when they find their servers being visited by scraper bots, I advise that you add a T.O.S. section to your website clearly forbidding automated access to your website (this is what google,yahoo,msn,etc do).  It won&#039;t stop the scrapers, but it will give you a much more definitive legal standing.  Of course if you do that, then I guess Google and the other two main engines will be breaching your T.O.S. everytime they come along to perform some more automated spidering and scraping.  It&#039;s a mad mad world...&lt;br&gt;&lt;br&gt;Now that I&#039;ve said all that... don&#039;t add a scraper to newsgator.  It would be bloat more than anything else.
</description>
		<content:encoded><![CDATA[<p>Mark &#8211; 1/30/2004 1:47:30 PM<br />
<br />>>Right on. And if you don&#8217;t want people to steal your car, don&#8217;t go leaving it in a parking lot.<br />
<br />  &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</p>
<p>  Your analogy is slightly incorrect.</p>
<p>  Putting your files on a webserver without any form of access restriction, more closely resembles leaving your car in an airport or major metropolitan parking lot WITHOUT LOCKING THE CAR DOORS, while you go on vacation.  And if you don&#8217;t want your car stolen, then you definately should NOT do that.</p>
<p>  I run alot of tutorial and article directory sites so I deal with people scraping my content on an hourly basis.</p>
<p>  When I put something on any of my webservers that I do not want being publicly used, or scraped, then I use any number of well-known methods to restrict access to the data.  Htaccess files would be one general example.</p>
<p>  When I see what I know are certainly scraper bots (besides the search engines, who are generally the biggest resource hogs I deal with) I allow them to scrape anything to which I already allow public access (aka UNRESTRICTED data).  In the event that a particular bot begins to use too many resources, I simply ban the bot.  I choose to do this manually, but there are quite a few extremely good and free pieces of software that will ban overzealous bots with a great deal of automation.</p>
<p>If you don&#8217;t want unrestricted access to your data, you can &#8220;lock your doors&#8221;.  So, that being said&#8230;</p>
<p>&#8220;Greg &#8211; 1/6/2004 10:46:00 PM<br />
<br />If you dont want people to use your html, dont put it on a web server. You yanks are truly amazing&#8221;</p>
<p>And for those who still insist on having all their data publicly accessible but wish to continue to get upset when they find their servers being visited by scraper bots, I advise that you add a T.O.S. section to your website clearly forbidding automated access to your website (this is what google,yahoo,msn,etc do).  It won&#8217;t stop the scrapers, but it will give you a much more definitive legal standing.  Of course if you do that, then I guess Google and the other two main engines will be breaching your T.O.S. everytime they come along to perform some more automated spidering and scraping.  It&#8217;s a mad mad world&#8230;</p>
<p>Now that I&#8217;ve said all that&#8230; don&#8217;t add a scraper to newsgator.  It would be bloat more than anything else.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aaron Willis</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-404</link>
		<dc:creator>Aaron Willis</dc:creator>
		<pubDate>Thu, 05 Jan 2006 23:59:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-404</guid>
		<description>I work for a web scraping company called &lt;a href=&quot;http://www.scrapegoat.com&quot; rel=&quot;nofollow&quot;&gt;ScrapeGoat.&lt;/a&gt; Our view on collecting data is that if its publically available, then the data by default is in the public domain. However some companies (including most search engines)love to scrape and store everybody else&#039;s data and use it to make money, but then cry &quot;foul&quot; if anybody tries to scrape data from them.  &lt;br&gt;&lt;br&gt;Its kinda like putting a drinking fountain in a public park using public water supply and then getting mad if anybody tries to take a drink.
</description>
		<content:encoded><![CDATA[<p>I work for a web scraping company called <a href="http://www.scrapegoat.com" rel="nofollow">ScrapeGoat.</a> Our view on collecting data is that if its publically available, then the data by default is in the public domain. However some companies (including most search engines)love to scrape and store everybody else&#8217;s data and use it to make money, but then cry &#8220;foul&#8221; if anybody tries to scrape data from them.  </p>
<p>Its kinda like putting a drinking fountain in a public park using public water supply and then getting mad if anybody tries to take a drink.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-403</link>
		<dc:creator>Alex</dc:creator>
		<pubDate>Thu, 08 Dec 2005 22:26:21 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-403</guid>
		<description>Take a look at SW Explorer Automation (http://home.comcast.net/~furmana/SWIEAutomation.htm)(SWEA). SWEA creates an object model (automation interface) for any Web application running in Internet Explorer. It uses XPath expressions to extract data from the Web pages and the expressions can be visually defined using SWEA designer.
</description>
		<content:encoded><![CDATA[<p>Take a look at SW Explorer Automation (<a href="http://home.comcast.net/~furmana/SWIEAutomation.htm)(SWEA" rel="nofollow">http://home.comcast.net/~furmana/SWIEAutomation.htm)(SWEA</a>). SWEA creates an object model (automation interface) for any Web application running in Internet Explorer. It uses XPath expressions to extract data from the Web pages and the expressions can be visually defined using SWEA designer.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dharma</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-402</link>
		<dc:creator>Dharma</dc:creator>
		<pubDate>Wed, 02 Feb 2005 17:00:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-402</guid>
		<description>Yes! I too think that screen scrapping is ethical. Becaz we can extarct the required portion of the html page for search engines and so on.It is a great feature to have.
</description>
		<content:encoded><![CDATA[<p>Yes! I too think that screen scrapping is ethical. Becaz we can extarct the required portion of the html page for search engines and so on.It is a great feature to have.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stephen</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-401</link>
		<dc:creator>Stephen</dc:creator>
		<pubDate>Wed, 15 Sep 2004 02:12:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-401</guid>
		<description>Right on, Stuart.&lt;br&gt;&lt;br&gt;1. How are you supposed to create a search engine without screen scraping part of the content?&lt;br&gt;&lt;br&gt;2.  I use Google&#039;s cached pages feature all the time.  Does anyone else?
</description>
		<content:encoded><![CDATA[<p>Right on, Stuart.</p>
<p>1. How are you supposed to create a search engine without screen scraping part of the content?</p>
<p>2.  I use Google&#8217;s cached pages feature all the time.  Does anyone else?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Stuart</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-400</link>
		<dc:creator>Stuart</dc:creator>
		<pubDate>Thu, 29 Jul 2004 19:07:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-400</guid>
		<description>Doesn&#039;t Google &quot;Screen Scrape&quot; content from all the sites in the world. If they can do it then it&#039;s not illegal surely.
</description>
		<content:encoded><![CDATA[<p>Doesn&#8217;t Google &#8220;Screen Scrape&#8221; content from all the sites in the world. If they can do it then it&#8217;s not illegal surely.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aaron</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-399</link>
		<dc:creator>Aaron</dc:creator>
		<pubDate>Sat, 27 Mar 2004 09:17:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-399</guid>
		<description>I like the idea of being able to embed content and/or services into larger applications; legal or not it is going to happen.&lt;br&gt;&lt;br&gt;With that said, perhaps there is a way to advertise for them within our applications; possibly an extra meta tag with some ad info.
</description>
		<content:encoded><![CDATA[<p>I like the idea of being able to embed content and/or services into larger applications; legal or not it is going to happen.</p>
<p>With that said, perhaps there is a way to advertise for them within our applications; possibly an extra meta tag with some ad info.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-398</link>
		<dc:creator>Mark</dc:creator>
		<pubDate>Fri, 30 Jan 2004 20:47:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-398</guid>
		<description>&gt;&gt;If you dont want people to use your html, dont put it on a web server. You yanks are truly amazing.&lt;br&gt;&lt;br&gt;Right on. And if you don&#039;t want people to steal your car, don&#039;t go leaving it in a parking lot.
</description>
		<content:encoded><![CDATA[<p>>>If you dont want people to use your html, dont put it on a web server. You yanks are truly amazing.</p>
<p>Right on. And if you don&#8217;t want people to steal your car, don&#8217;t go leaving it in a parking lot.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jesse</title>
		<link>http://www.rassoc.com/gregr/weblog/2003/03/27/screen-scraping/comment-page-1/#comment-397</link>
		<dc:creator>Jesse</dc:creator>
		<pubDate>Wed, 07 Jan 2004 06:35:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.gregrphoto.com/rassoc/gregr/weblog/2003/03/27/screen-scraping/#comment-397</guid>
		<description>&quot;Tools that scrape these sites are literally stealing money from them.&quot;&lt;br&gt;&lt;br&gt;http://dictionary.reference.com/search?q=literally
</description>
		<content:encoded><![CDATA[<p>&#8220;Tools that scrape these sites are literally stealing money from them.&#8221;</p>
<p><a href="http://dictionary.reference.com/search?q=literally" rel="nofollow">http://dictionary.reference.com/search?q=literally</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
