Re: Sorting characters of the text - script doesn't work as expected
Re: Sorting characters of the text - script doesn't work as expected
- Subject: Re: Sorting characters of the text - script doesn't work as expected
- From: ILJA SHEBALIN <email@hidden>
- Date: Mon, 17 Apr 2017 16:44:15 +0300
Hi, The script even when modified returns empty lists when checking against Non-Greek "text items", English characters (let's assume the text isn't in Greek. The script should be bidirectional, should print out 0 for Greeks and the count number for non-Greeks, right?). Log commands are skipped by the script beyond the very first "ParagraphCharacters" variable (posting a snippet of its logging result too). Isn't it designed to return values for Non-Greek ones as well? Baffling at best. The script (modified taking into account suggestions provided by other users) and its results (a pastebin link):
|
{\rtf1\ansi\ansicpg1251\cocoartf1138\cocoasubrtf510
{\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\froman\fcharset0 Times-Roman;\f2\fswiss\fcharset0 ArialMT;
\f3\fnil\fcharset0 Verdana;\f4\fmodern\fcharset0 Courier;}
{\colortbl;\red255\green255\blue255;\red0\green0\blue0;\red109\green109\blue109;\red0\green0\blue245;
\red255\green255\blue255;\red0\green0\blue0;\red249\green249\blue249;\red50\green106\blue251;\red140\green140\blue150;
\red50\green106\blue250;\red38\green38\blue38;\red201\green201\blue201;\red0\green0\blue141;\red91\green102\blue130;
\red255\green255\blue213;}
{\*\listtable{\list\listtemplateid1\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{none\}}{\leveltext\leveltemplateid1\'00;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid1}
{\list\listtemplateid2\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{disc\}}{\leveltext\leveltemplateid101\'01\uc0\u8226 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid2}
{\list\listtemplateid3\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{disc\}}{\leveltext\leveltemplateid201\'01\uc0\u8226 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid3}
{\list\listtemplateid4\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{disc\}}{\leveltext\leveltemplateid301\'01\uc0\u8226 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid4}
{\list\listtemplateid5\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{circle\}}{\leveltext\leveltemplateid401\'01\uc0\u9702 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid5}
{\list\listtemplateid6\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{disc\}}{\leveltext\leveltemplateid501\'01\uc0\u8226 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid6}
{\list\listtemplateid7\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{disc\}}{\leveltext\leveltemplateid601\'01\uc0\u8226 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid7}
{\list\listtemplateid8\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{circle\}}{\leveltext\leveltemplateid701\'01\uc0\u9702 ;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid8}
{\list\listtemplateid9\listhybrid{\listlevel\levelnfc23\levelnfcn23\leveljc0\leveljcn0\levelfollow0\levelstartat1\levelspace360\levelindent0{\*\levelmarker \{none\}}{\leveltext\leveltemplateid801\'00;}{\levelnumbers;}\fi-360\li720\lin720 }{\listname ;}\listid9}}
{\*\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}{\listoverride\listid2\listoverridecount0\ls2}{\listoverride\listid3\listoverridecount0\ls3}{\listoverride\listid4\listoverridecount0\ls4}{\listoverride\listid5\listoverridecount0\ls5}{\listoverride\listid6\listoverridecount0\ls6}{\listoverride\listid7\listoverridecount0\ls7}{\listoverride\listid8\listoverridecount0\ls8}{\listoverride\listid9\listoverridecount0\ls9}}
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural
\f0\fs24 \cf0 \
\
\
\
\
\
\
\
\
\itap1\trowd \taflags1 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth9240\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\itap2\trowd \taflags1 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth40\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx2880
\clvertalc \clshdrawnil \clwWidth6900\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx5760
\clvertalc \clshdrawnil \clwWidth2300\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap2\tx220\tx720\pardeftab720\li720\fi-720
\ls1\ilvl0
\f1\fs32 \cf0 \nestcell
\itap3\trowd \taflags1 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth6900\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap3\tx220\tx720\pardeftab720\li720\fi-720\sl480
\ls1\ilvl0
\fs42 \cf0 {\field{\*\fldinst{HYPERLINK "https://googleads.g.doubleclick.net/aclk?sa=l&ai=C04mgWn_yWPj2DNSsY5CNnugB0srElkjpvpKkpgL_hKGGNBABIKCxuAtguQOgAavUo_IDyAEBqAMByAPLBKoElgFP0CMi7NrE5TSL_VkV9CHjuPWidoo6UMhOMFRAPBkT2vmDanoOUpj0-u8ZmIUZn6KJivepjVo5i62tqQNsd9310KRSF91Rf38TPEVEQRodJA0dV6XMCpsuTS207pFH20WIEBurMCzILHvSHeHD4Fley5owLsQP17GXxKGv4Egle3jYmnnr4qWCRExxFfeVOE--rsUGpE2AB72r3A2oB6a-G9gHAdIIBQiAYRABohNECj0IA0ABUggKBhIECAEQAWjMj5LQhg1yJhIkEMnKuohRIAIoATgCQLH5iyJYAWj-__________8BgAEBmAEDGgMKATDYEwg&num=1&sig=AOD64_3guu62QiuryB_p-Pc1fAeRi6ipcw&client=ca-pub-9311532361854131&adurl=https://www.egnyte.com/wsgi/route_to_dc?target=/corp/lp4/online-file-sharing.html&utm_source=google&utm_medium=cpc&utm_term=&utm_campaign=EU+File+Sharing+Content+Auto&ad=78886249129"}}{\fldrslt Online File Sharing}}
\fs32 \nestcell \nestrow
\itap3\trowd \taflags1 \trgaph108\trleft-108 \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth6900\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap3\tx220\tx720\pardeftab720\li720\fi-720\sl280
\ls1\ilvl0
\fs24 \cf0 {\field{\*\fldinst{HYPERLINK "https://googleads.g.doubleclick.net/aclk?sa=l&ai=C04mgWn_yWPj2DNSsY5CNnugB0srElkjpvpKkpgL_hKGGNBABIKCxuAtguQOgAavUo_IDyAEBqAMByAPLBKoElgFP0CMi7NrE5TSL_VkV9CHjuPWidoo6UMhOMFRAPBkT2vmDanoOUpj0-u8ZmIUZn6KJivepjVo5i62tqQNsd9310KRSF91Rf38TPEVEQRodJA0dV6XMCpsuTS207pFH20WIEBurMCzILHvSHeHD4Fley5owLsQP17GXxKGv4Egle3jYmnnr4qWCRExxFfeVOE--rsUGpE2AB72r3A2oB6a-G9gHAdIIBQiAYRABohNECj0IA0ABUggKBhIECAEQAWjMj5LQhg1yJhIkEMnKuohRIAIoATgCQLH5iyJYAWj-__________8BgAEBmAEDGgMKATDYEwg&num=1&sig=AOD64_3guu62QiuryB_p-Pc1fAeRi6ipcw&client=ca-pub-9311532361854131&adurl=https://www.egnyte.com/wsgi/route_to_dc?target=/corp/lp4/online-file-sharing.html&utm_source=google&utm_medium=cpc&utm_term=&utm_campaign=EU+File+Sharing+Content+Auto&ad=78886249129"}}{\fldrslt Easy Secure Business Collaboration 4 Stars by PC Magazine. Free Trial}} {\field{\*\fldinst{HYPERLINK "https://googleads.g.doubleclick.net/aclk?sa=l&ai=C04mgWn_yWPj2DNSsY5CNnugB0srElkjpvpKkpgL_hKGGNBABIKCxuAtguQOgAavUo_IDyAEBqAMByAPLBKoElgFP0CMi7NrE5TSL_VkV9CHjuPWidoo6UMhOMFRAPBkT2vmDanoOUpj0-u8ZmIUZn6KJivepjVo5i62tqQNsd9310KRSF91Rf38TPEVEQRodJA0dV6XMCpsuTS207pFH20WIEBurMCzILHvSHeHD4Fley5owLsQP17GXxKGv4Egle3jYmnnr4qWCRExxFfeVOE--rsUGpE2AB72r3A2oB6a-G9gHAdIIBQiAYRABohNECj0IA0ABUggKBhIECAEQAWjMj5LQhg1yJhIkEMnKuohRIAIoATgCQLH5iyJYAWj-__________8BgAEBmAEDGgMKATDYEwg&num=1&sig=AOD64_3guu62QiuryB_p-Pc1fAeRi6ipcw&client=ca-pub-9311532361854131&adurl=https://www.egnyte.com/wsgi/route_to_dc?target=/corp/lp4/online-file-sharing.html&utm_source=google&utm_medium=cpc&utm_term=&utm_campaign=EU+File+Sharing+Content+Auto&ad=78886249129"}}{\fldrslt Go to egnyte.com/Free-Trial}}
\fs32 \nestcell \lastrow\nestrow\nestcell
\pard\intbl\itap2\tx220\tx720\pardeftab720\li720\fi-720\sl860\qr
\ls1\ilvl0
\fs42 \cf4 \cb5 {\field{\*\fldinst{HYPERLINK "https://googleads.g.doubleclick.net/aclk?sa=l&ai=C04mgWn_yWPj2DNSsY5CNnugB0srElkjpvpKkpgL_hKGGNBABIKCxuAtguQOgAavUo_IDyAEBqAMByAPLBKoElgFP0CMi7NrE5TSL_VkV9CHjuPWidoo6UMhOMFRAPBkT2vmDanoOUpj0-u8ZmIUZn6KJivepjVo5i62tqQNsd9310KRSF91Rf38TPEVEQRodJA0dV6XMCpsuTS207pFH20WIEBurMCzILHvSHeHD4Fley5owLsQP17GXxKGv4Egle3jYmnnr4qWCRExxFfeVOE--rsUGpE2AB72r3A2oB6a-G9gHAdIIBQiAYRABohNECj0IA0ABUggKBhIECAEQAWjMj5LQhg1yJhIkEMnKuohRIAIoATgCQLH5iyJYAWj-__________8BgAEBmAEDGgMKATDYEwg&num=1&sig=AOD64_3guu62QiuryB_p-Pc1fAeRi6ipcw&client=ca-pub-9311532361854131&adurl=https://www.egnyte.com/wsgi/route_to_dc?target=/corp/lp4/online-file-sharing.html&utm_source=google&utm_medium=cpc&utm_term=&utm_campaign=EU+File+Sharing+Content+Auto&ad=78886249129"}}{\fldrslt {{\NeXTGraphic nessie_icon_tiamat_black.png \width520 \height840 \noorient
}¬}}}
\fs32 \cf0 \cb1 \nestcell \lastrow\nestrow\cell \lastrow\row
\pard\pardeftab720
\cf0 \cb5 \
\
\
\
\pard\pardeftab720
\cf0 \cb1 \
\
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trcbpat7 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth9360\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl380\qc
\f2 \cf0 Ad closed by {{\NeXTGraphic googlelogo_dark_color_84x28dp.png \width3360 \height1120 \noorient
}¬}
\fs26 \
\pard\intbl\itap1\pardeftab720\sl300\qc
\fs30 \cf5 \cb8 Report this ad{\field{\*\fldinst{HYPERLINK "https://support.google.com/adsense/troubleshooter/1631343"}}{\fldrslt \cf9 \cb5 AdChoices\'a0{{\NeXTGraphic iconx2-000000.png \width480 \height480 \noorient
}¬}}}
\fs26 \cf0 \cb1 \cell \lastrow\row
\pard\pardeftab720\qc
\f1\fs32 \cf0 \'a0\cb7 \
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth1720\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl280\qc
\f2\fs24 \cf10 \cb1 Ad covered content\cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth1720\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl280\qc
\cf10 Not interested in this ad\cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth1720\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl280\qc
\cf10 Ad was inappropriate\cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth1720\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl280\qc
\cf10 Seen this ad multiple times\cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trcbpat7 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth9360\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl340\qc
\b\fs34 \cf11 We'll try not to show that ad again\
\pard\intbl\itap1\pardeftab720\sl340\qc
\b0 \cf0 \cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trcbpat7 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth9360\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl360\qc
\cf0 Ad closed by {{\NeXTGraphic 1__#$!@%!#__googlelogo_dark_color_84x28dp.png \width3360 \height1120 \noorient
}¬}
\fs32 \cell \lastrow\row
\pard\tx560\tx1120\tx1680\tx2240\tx2800\tx3360\tx3920\tx4480\tx5040\tx5600\tx6160\tx6720\pardirnatural
\f0\fs24 \cf0 \
\pard\pardeftab720\qc
\f3\i\fs18 \cf3 Advertisement\
\pard\pardeftab720\qc
\i0\fs24 \cf12 \ul \ulc12 \
\pard\pardeftab720\qc
{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/"}}{\fldrslt
\b\fs22 \cf13 \cb14 \ulnone {{\NeXTGraphic robotstxtwrap.png \width2520 \height1260 \noorient
}¬}}}\pard\pardeftab720\qc
\b\fs22 \cf12 \cb14 \ulnone \
\pard\pardeftab720\qr
\i\b0\fs18 \cf3 \cb5 Navigation\
\pard\tx220\tx720\pardeftab720\li720\fi-720\sa220
\ls2\ilvl0
\i0\b\fs22 \cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/robotstxt.html"}}{\fldrslt The /robots.txt}}\cf12 \
\ls2\ilvl0\cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/meta.html"}}{\fldrslt <META> tags}}\cf12 \
\ls2\ilvl0\cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/faq.html"}}{\fldrslt Frequently Asked Questions}}\cf12 \
\ls2\ilvl0\cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/mailinglist.html"}}{\fldrslt Mailing list}}\cf12 \
\ls2\ilvl0\cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/other.html"}}{\fldrslt Other Sites}}\cf12 \
\ls2\ilvl0\cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/about.html"}}{\fldrslt About robotstxt.org}}\cf12 \
\pard\pardeftab720\qr
\i\b0\fs18 \cf3 Tools\
\pard\tx220\tx720\pardeftab720\li720\fi-720\sa220
\ls3\ilvl0
\i0\b\fs22 \cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/checker.html"}}{\fldrslt /robots.txt checker}}\cf12 \
\ls3\ilvl0\cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/db.html"}}{\fldrslt Robots Database}}\cf12 \
\ls3\ilvl0\cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/iplookup.html"}}{\fldrslt IP lookup}}\cf12 \
\pard\pardeftab720
\cf12 \ul \ulc12 \
\pard\pardeftab720\sa280
\fs36 \cf0 \ulnone About /robots.txt\
\pard\pardeftab720\sa280
\fs28 \cf0 In a nutshell\
\pard\pardeftab720\sa240
\b0\fs24 \cf0 Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called
\i The Robots Exclusion Protocol
\i0 .\
It works likes this: a robot wants to vists a Web site URL, say http://www.example.com/welcome.html. Before it does so, it firsts checks for http://www.example.com/robots.txt, and finds:\
\pard\pardeftab720
Disallow: /\
\pard\pardeftab720\sa240
\f3 \cf0 \cb5 The "
\f3 " means this section applies to all robots. The "
\f4 Disallow: /
\f3 " tells the robot that it should not visit any pages on the site.\
There are two important considerations when using /robots.txt:\
\pard\tx220\tx720\pardeftab720\li720\fi-720
\ls4\ilvl0\cf0 {\listtext \'95 }robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.\
{\listtext \'95 }the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.\
\pard\pardeftab720\sa240
\cf0 So don't try to use /robots.txt to hide information.\
See also:\
\pard\tx220\tx720\pardeftab720\li720\fi-720
\ls5\ilvl0\cf13 {\listtext \uc0\u9702 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/faq/blockjustbad.html"}}{\fldrslt Can I block just bad robots?}}\cf0 \
\ls5\ilvl0\cf13 {\listtext \uc0\u9702 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/faq/ignore.html"}}{\fldrslt Why did this robot ignore my /robots.txt?}}\cf0 \
\ls5\ilvl0\cf13 {\listtext \uc0\u9702 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/faq/nosecurity.html"}}{\fldrslt What are the security implications of /robots.txt?}}\cf0 \
\pard\pardeftab720\sa280
\b\fs28 \cf0 The details\
\pard\pardeftab720\sa240
\b0\fs24 \cf0 The /robots.txt is a de-facto standard, and is not owned by any standards body. There are two historical descriptions:\
\pard\tx220\tx720\pardeftab720\li720\fi-720
\ls6\ilvl0\cf0 {\listtext \'95 }the original 1994 {\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/orig.html"}}{\fldrslt \cf13 A Standard for Robot Exclusion}} document.\
{\listtext \'95 }a 1997 Internet Draft specification {\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/norobots-rfc.txt"}}{\fldrslt \cf13 A Method for Web Robots Control}}\
\pard\pardeftab720\sa240
\cf0 In addition there are external resources:\
\pard\tx220\tx720\pardeftab720\li720\fi-720
\ls7\ilvl0\cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1"}}{\fldrslt HTML 4.01 specification, Appendix B.4.1}}\cf0 \
\ls7\ilvl0\cf13 {\listtext \'95 }{\field{\*\fldinst{HYPERLINK "http://en.wikipedia.org/wiki/Robots.txt"}}{\fldrslt Wikipedia - Robots Exclusion Standard}}\cf0 \
\pard\pardeftab720\sa240
\cf0 The /robots.txt standard is not actively developed. See {\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/faq/future.html"}}{\fldrslt \cf13 What about further development of /robots.txt?}} for more discussion.\
The rest of this page gives an overview of how to use /robots.txt on your server, with some simple recipes. To learn more see also the {\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/faq.html"}}{\fldrslt \cf13 FAQ}}.\
\pard\pardeftab720\sa280
\b\fs28 \cf0 How to create a /robots.txt file\
\pard\pardeftab720\sa300
\fs24 \cf0 Where to put it\
\pard\pardeftab720\sa240
\b0 \cf0 The short answer: in the top-level directory of your web server.\
The longer answer:\
When a robot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.\
For example, for "
\f4 http://www.example.com/shop/index.html
\f3 , it will remove the "
\f4 /shop/index.html
\f3 ", and replace it with "
\f4 /robots.txt
\f3 ", and will end up with "http://www.example.com/robots.txt".\
So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "
\f4 index.html
\f3 " welcome page. Where exactly that is, and how to put the file there, depends on your web server software.\
Remember to use all lower case for the filename: "
\f4 robots.txt
\f3 ", not "
\f4 Robots.TXT
\f3 .\
See also:\
\pard\tx220\tx720\pardeftab720\li720\fi-720
\ls8\ilvl0\cf13 {\listtext \uc0\u9702 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/faq/editor.html"}}{\fldrslt What program should I use to create /robots.txt?}}\cf0 \
\ls8\ilvl0\cf13 {\listtext \uc0\u9702 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/faq/virtual.html"}}{\fldrslt How do I use /robots.txt on a virtual host?}}\cf0 \
\ls8\ilvl0\cf13 {\listtext \uc0\u9702 }{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/faq/shared.html"}}{\fldrslt How do I use /robots.txt on a shared host?}}\cf0 \
\pard\pardeftab720\sa300
\b \cf0 What to put in it\
\pard\pardeftab720
\b0 \cf0 The "/robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:
\f4 \cb15 \
Disallow: /cgi-bin/\
Disallow: /tmp/\
Disallow: /~joe/\
\pard\pardeftab720\sa240
\f3 \cf0 \cb5 In this example, three directories are excluded.\
Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records.\
Note also that globbing and regular expression are
\b not
What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Here follow some examples:\
\pard\pardeftab720\sa320
\b\fs20 \cf0 To exclude all robots from the entire server\
\pard\pardeftab720
Disallow: /\
\
\pard\pardeftab720\sa320
\f3\b\fs20 \cf0 \cb5 To allow all robots complete access\
\pard\pardeftab720
Disallow:\
\pard\pardeftab720\sa240
\f3 \cf0 \cb5 (or just create an empty "/robots.txt" file, or don't use one at all)\
\pard\pardeftab720\sa320
\b\fs20 \cf0 To exclude all robots from part of the server\
\pard\pardeftab720
Disallow: /cgi-bin/\
Disallow: /tmp/\
Disallow: /junk/\
\pard\pardeftab720\sa320
\f3\b\fs20 \cf0 \cb5 To exclude a single robot\
\pard\pardeftab720
Disallow: /\
\pard\pardeftab720\sa320
\f3\b\fs20 \cf0 \cb5 To allow a single robot\
\pard\pardeftab720
Disallow:\
\
Disallow: /\
\pard\pardeftab720\sa320
\f3\b\fs20 \cf0 \cb5 To exclude all files except one\
\pard\pardeftab720
\b0\fs24 \cf0 This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:
\f4 \cb15 \
Disallow: /~joe/stuff/\
\f3 \cb5 Alternatively you can explicitly disallow all disallowed pages:
\f4 \cb15 \
Disallow: /~joe/junk.html\
Disallow: /~joe/foo.html\
Disallow: /~joe/bar.html\
\pard\pardeftab720\qc
\f3\i\fs18 \cf3 \cb1 Advertisement\
\itap1\trowd \taflags1 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth9240\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\itap2\trowd \taflags1 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth40\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx2880
\clvertalc \clshdrawnil \clwWidth6900\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx5760
\clvertalc \clshdrawnil \clwWidth2300\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap2\tx220\tx720\pardeftab720\li720\fi-720
\ls9\ilvl0
\f1\i0\fs32 \cf0 \nestcell
\itap3\trowd \taflags1 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth6900\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap3\tx220\tx720\pardeftab720\li720\fi-720\sl480
\ls9\ilvl0
\fs42 \cf0 {\field{\*\fldinst{HYPERLINK "https://googleads.g.doubleclick.net/aclk?sa=l&ai=C04mgWn_yWPj2DNSsY5CNnugB0srElkjpvpKkpgL_hKGGNBABIKCxuAtguQOgAavUo_IDyAEBqAMByAPLBKoElgFP0CMi7NrE5TSL_VkV9CHjuPWidoo6UMhOMFRAPBkT2vmDanoOUpj0-u8ZmIUZn6KJivepjVo5i62tqQNsd9310KRSF91Rf38TPEVEQRodJA0dV6XMCpsuTS207pFH20WIEBurMCzILHvSHeHD4Fley5owLsQP17GXxKGv4Egle3jYmnnr4qWCRExxFfeVOE--rsUGpE2AB72r3A2oB6a-G9gHAdIIBQiAYRABohNECj0IA0ABUggKBhIECAEQAWjMj5LQhg1yJhIkEMnKuohRIAIoATgCQLH5iyJYAWj-__________8BgAEBmAEDGgMKATDYEwg&num=1&sig=AOD64_3guu62QiuryB_p-Pc1fAeRi6ipcw&client=ca-pub-9311532361854131&adurl=https://www.egnyte.com/wsgi/route_to_dc?target=/corp/lp4/online-file-sharing.html&utm_source=google&utm_medium=cpc&utm_term=&utm_campaign=EU+File+Sharing+Content+Auto&ad=78886249129"}}{\fldrslt Online File Sharing}}
\fs32 \nestcell \nestrow
\itap3\trowd \taflags1 \trgaph108\trleft-108 \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth6900\clftsWidth3 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap3\tx220\tx720\pardeftab720\li720\fi-720\sl280
\ls9\ilvl0
\fs24 \cf0 {\field{\*\fldinst{HYPERLINK "https://googleads.g.doubleclick.net/aclk?sa=l&ai=C04mgWn_yWPj2DNSsY5CNnugB0srElkjpvpKkpgL_hKGGNBABIKCxuAtguQOgAavUo_IDyAEBqAMByAPLBKoElgFP0CMi7NrE5TSL_VkV9CHjuPWidoo6UMhOMFRAPBkT2vmDanoOUpj0-u8ZmIUZn6KJivepjVo5i62tqQNsd9310KRSF91Rf38TPEVEQRodJA0dV6XMCpsuTS207pFH20WIEBurMCzILHvSHeHD4Fley5owLsQP17GXxKGv4Egle3jYmnnr4qWCRExxFfeVOE--rsUGpE2AB72r3A2oB6a-G9gHAdIIBQiAYRABohNECj0IA0ABUggKBhIECAEQAWjMj5LQhg1yJhIkEMnKuohRIAIoATgCQLH5iyJYAWj-__________8BgAEBmAEDGgMKATDYEwg&num=1&sig=AOD64_3guu62QiuryB_p-Pc1fAeRi6ipcw&client=ca-pub-9311532361854131&adurl=https://www.egnyte.com/wsgi/route_to_dc?target=/corp/lp4/online-file-sharing.html&utm_source=google&utm_medium=cpc&utm_term=&utm_campaign=EU+File+Sharing+Content+Auto&ad=78886249129"}}{\fldrslt Easy Secure Business Collaboration 4 Stars by PC Magazine. Free Trial}} {\field{\*\fldinst{HYPERLINK "https://googleads.g.doubleclick.net/aclk?sa=l&ai=C04mgWn_yWPj2DNSsY5CNnugB0srElkjpvpKkpgL_hKGGNBABIKCxuAtguQOgAavUo_IDyAEBqAMByAPLBKoElgFP0CMi7NrE5TSL_VkV9CHjuPWidoo6UMhOMFRAPBkT2vmDanoOUpj0-u8ZmIUZn6KJivepjVo5i62tqQNsd9310KRSF91Rf38TPEVEQRodJA0dV6XMCpsuTS207pFH20WIEBurMCzILHvSHeHD4Fley5owLsQP17GXxKGv4Egle3jYmnnr4qWCRExxFfeVOE--rsUGpE2AB72r3A2oB6a-G9gHAdIIBQiAYRABohNECj0IA0ABUggKBhIECAEQAWjMj5LQhg1yJhIkEMnKuohRIAIoATgCQLH5iyJYAWj-__________8BgAEBmAEDGgMKATDYEwg&num=1&sig=AOD64_3guu62QiuryB_p-Pc1fAeRi6ipcw&client=ca-pub-9311532361854131&adurl=https://www.egnyte.com/wsgi/route_to_dc?target=/corp/lp4/online-file-sharing.html&utm_source=google&utm_medium=cpc&utm_term=&utm_campaign=EU+File+Sharing+Content+Auto&ad=78886249129"}}{\fldrslt Go to egnyte.com/Free-Trial}}
\fs32 \nestcell \lastrow\nestrow\nestcell
\pard\intbl\itap2\tx220\tx720\pardeftab720\li720\fi-720\sl860\qr
\ls9\ilvl0
\fs42 \cf4 \cb5 {\field{\*\fldinst{HYPERLINK "https://googleads.g.doubleclick.net/aclk?sa=l&ai=C04mgWn_yWPj2DNSsY5CNnugB0srElkjpvpKkpgL_hKGGNBABIKCxuAtguQOgAavUo_IDyAEBqAMByAPLBKoElgFP0CMi7NrE5TSL_VkV9CHjuPWidoo6UMhOMFRAPBkT2vmDanoOUpj0-u8ZmIUZn6KJivepjVo5i62tqQNsd9310KRSF91Rf38TPEVEQRodJA0dV6XMCpsuTS207pFH20WIEBurMCzILHvSHeHD4Fley5owLsQP17GXxKGv4Egle3jYmnnr4qWCRExxFfeVOE--rsUGpE2AB72r3A2oB6a-G9gHAdIIBQiAYRABohNECj0IA0ABUggKBhIECAEQAWjMj5LQhg1yJhIkEMnKuohRIAIoATgCQLH5iyJYAWj-__________8BgAEBmAEDGgMKATDYEwg&num=1&sig=AOD64_3guu62QiuryB_p-Pc1fAeRi6ipcw&client=ca-pub-9311532361854131&adurl=https://www.egnyte.com/wsgi/route_to_dc?target=/corp/lp4/online-file-sharing.html&utm_source=google&utm_medium=cpc&utm_term=&utm_campaign=EU+File+Sharing+Content+Auto&ad=78886249129"}}{\fldrslt {{\NeXTGraphic 1__#$!@%!#__nessie_icon_tiamat_black.png \width520 \height840 \noorient
}¬}}}
\fs32 \cf0 \cb1 \nestcell \lastrow\nestrow\cell \lastrow\row
\pard\pardeftab720
\cf0 \cb5 \
\
\
\
\pard\pardeftab720
\cf0 \cb1 \
\
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trcbpat7 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth9360\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl380\qc
\f2 \cf0 Ad closed by {{\NeXTGraphic 2__#$!@%!#__googlelogo_dark_color_84x28dp.png \width3360 \height1120 \noorient
}¬}
\fs26 \
\pard\intbl\itap1\pardeftab720\sl300\qc
\fs30 \cf5 \cb8 Report this ad{\field{\*\fldinst{HYPERLINK "https://support.google.com/adsense/troubleshooter/1631343"}}{\fldrslt \cf9 \cb5 AdChoices\'a0{{\NeXTGraphic 1__#$!@%!#__iconx2-000000.png \width480 \height480 \noorient
}¬}}}
\fs26 \cf0 \cb1 \cell \lastrow\row
\pard\pardeftab720\qc
\f1\fs32 \cf0 \'a0\cb7 \
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth1720\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl280\qc
\f2\fs24 \cf10 \cb1 Ad covered content\cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth1720\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl280\qc
\cf10 Not interested in this ad\cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth1720\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl280\qc
\cf10 Ad was inappropriate\cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth1720\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl280\qc
\cf10 Seen this ad multiple times\cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trcbpat7 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth9360\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl340\qc
\b\fs34 \cf11 We'll try not to show that ad again\
\pard\intbl\itap1\pardeftab720\sl340\qc
\b0 \cf0 \cell \lastrow\row
\itap1\trowd \taflags0 \trgaph108\trleft-108 \trcbpat7 \trbrdrt\brdrnil \trbrdrl\brdrnil \trbrdrt\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clwWidth9360\clftsWidth3 \clmart10 \clmarl10 \clmarb10 \clmarr10 \clbrdrt\brdrnil \clbrdrl\brdrnil \clbrdrb\brdrnil \clbrdrr\brdrnil \clpadl0 \clpadr0 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720\sl360\qc
\cf0 Ad closed by {{\NeXTGraphic 3__#$!@%!#__googlelogo_dark_color_84x28dp.png \width3360 \height1120 \noorient
}¬}
\fs32 \cell \lastrow\row
\pard\pardeftab720\qr
{\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/about.html"}}{\fldrslt
\f3\i\fs18 \cf13 \cb5 About this site}}
\f3\i\fs18 \cf12 \cb5 | {\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/privacy.html"}}{\fldrslt \cf13 Privacy and cookies policy}} | {\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/contact.html"}}{\fldrslt \cf13 Contact us}} | {\field{\*\fldinst{HYPERLINK "http://www.robotstxt.org/copyrights.html"}}{\fldrslt \cf13 \'a9 2007. All rights reserved.}}\
Do not post admin requests to the list. They will be ignored.
AppleScript-Users mailing list (email@hidden)
Help/Unsubscribe/Update your Subscription:
Archives: http://lists.apple.com/archives/applescript-users
This email sent to email@hidden