ChatGPT users have the option to scrap the web crawler by adding a “disallow” command to a standard file on the server…
Artificial intelligence firm OpenAI has launched “GPTBot” – its new web crawling tool it says could potentially be used to improve future ChatGPT models.
“Web pages crawled with the GPTBot user agent may potentially be used to improve future models,” OpenAI said in a new blog post, adding it could improve accuracy and expand the capabilities of future iterations.
A web crawler, sometimes called a web spider, is a type of bot that indexes the content of websites across the internet. Search engines like Google and Bing use them in order for the websites to show up in search results.
OpenAI said the web crawler will collect publicly available data from the world wide web, but will filter out sources that require paywalled content, or is known to gather personally identifiable information, or has text that violates its policies.
Breaking 🚨
OpenAI just launched GPTBot, a web crawler designed to automatically scrape data from the entire internet.
This data will be used to train future AI models like GPT-4 and GPT-5!
GPTBot ensures that sources violating privacy and those behind paywalls are excluded. pic.twitter.com/oR3kY4buaU
— Shubham Saboo (@Saboo_Shubham_) August 7, 2023
It should be noted that website owners can deny the web crawler by adding a “disallow” command to a standard file on the server.
Read More: OpenAI Launches Web-Crawler “GPTBot”
