I recently learned that I did a bad thing while trying to get information about Shell gas stations. The information was available to everyone logging onto the site, but I over used it. I even learned that there are names for people like me doing this type of thing. Many of the POI files on POI-Factory are small and/or local and not difficult to gather information regarding them. I think that other files are created with assistance by whatever company they are for because they are happy to get the word out about where they are. Others obviously do not want you to know where they are.
I also think that I have been frowned upon and possibly given the cold shoulder because I have openly stated that I am re-working the Shell Gas Station file. Please let me explain.
I live ½ mile from a Shell gas station. The address is 13666 W US 84, Newton, AL 36352. The address is slightly off when viewed by Google Earth (or Maps). The coordinates in the POI Factory file (31.308618,-85.606229) place the station 5.9 miles off from the actual location among several houses in the middle of nowhere. Coordinates that will place you in the driveway of the station are (31.242309,-85.614273). Based on that misinformation, I did a look up to see how many stations were listed for Dothan, AL. None of the 8 stations around Dothan were listed, and the only other one that was close was listed for Midland City, AL. This piqued my interest and I found out that there were many more that wasn’t listed or the information should be modified and updated. I sent an update to ScottK, but received no answer and no updates have been made to the file since 11/14/2008 to my knowledge. I therefore decided to try to make the file as clean as possible, primarily for my own use as a Shell credit card holder.
Findings: Original file claims 3667 locations in all 50 states. I have been able to identify a total of 412 stations in Alabama (this includes those AL locations in the original file). Of those, I have verified 285 by Google Earth. 127 stations I can’t verify (see a picture) or locate on anything I use.
I have been able to identify a total of 959 stations in Florida (before being blocked). There are probably many more, but of that total, I verified 543 (Google camera). I could not locate or verify 75 and I still had 341 to be looked up.
To put this into perspective, there are approx 1,371 stations in Alabama and Florida combined. That only leaves approx 2300 stations for the other 48 states (California and Texas probably have that many, if not more).
MY BURNING QUESTION: If what I did (spidering, boting, crawling) was wrong, HOW did the original file get created? I am pretty sure that it was not pieced together one station at a time from submissions by individual people. I am also pretty sure that Shell Oil Company did not supply the information.
ADDENDUM: Garmin 2009 and 2010 maps do not show the original station above and only shows 5 of the 8 stations in Dothan, AL. Where does Garmin get their information for their POI’s? I know that the units aren’t smart enough to identify every type of business encountered while driving happily down the road. Everyone talks about the maps being over a year old and not up to date. I can prove that the same applies to at least one of the files on POI-Factory.
The factory installed POIs on the Garmin units comes from Navteq. They would have gotten the data from the companies themselves and/or have agents doing the legwork.
As for being a bot, that term is generally used to refer to a computer/program that does the same task you were doing.
As for web hosters frowning on that kind a behavior they can take a flying leap, you offer information/data but get upset that someone is using your bandwidth.
I read your whole statement, and I have fairly good reading comprehension, and did not see a question posed.
MY BURNING QUESTION: If what I did (spidering, boting, crawling) was wrong, HOW did the original file get created?
Nice write-up and I agree, Curt is providing valuable information that Shell is seemingly too lazy to post themselves.
The information is posted, but in a format which only gives small amounts of data, a few stations, at a time. They only got mad when he kept hitting there system over and over in a short period of time.
As for the Question: D'oh, as I stated "fairly good" but not perfect.
Who's frowning down upon you or giving you the cold shoulder? If it's the original POI file author, I can understand his gripe to a some degree. He/she probably put a lot of work into the file and is proud of it. It might make them feel a little hurt/insulted that someone comes along and says they are going to rework it. My suggestion would be to let the author know there are some inconsistencies with the file and you would like to help him/her.
That being said, what you are doing is a great help and is appreciated by just about everyone. If anyone has issues with someone trying to make things better or more accurate, then they probably shouldn't be here.
Note: reread your comment (more carefully this time) and see that you tried contacting the original author. Next time, you might want to explain the situation to Miss POI and ask what would be the best course of action. Sorry about that.
Just to clarify comments that I made in response to other comments on another thread.
Spidering, scraping, etc refers to using some sort of automated means (a program, a script, or something similar) to obtain data from a website.
It does not refer to someone manually visiting a website and getting data that is readily available.
The reasons that the automated means are frowned upon are that it can place added stress on the servers and typical it obtains the data in ways and quantities which the owner of the data did not intend.
But again, I did not intend to imply that a user simply visiting a site repeatedly and obtaining the data manually was a bad thing. However, the website owner and the owner of the data can restrict the data in any manner that they see fit, as well.
terms | privacy | contactCopyright © 2006-2021