US CVS Pharmacies from Switchboard

 

I posted on my website an automatically generated file for all (almost smile ) of the CVS pharmacies that Switchboard.com lists. The file is comma separated and has:

Name, address, city, state, zip, lat, lon

I haven't gone thru the file to check it thoroughly , so someone will need to go in and check the listings.
There might be errors as this was all pulled straight from HTML and things can get buggered up sometimes.

I am also going to post the program I wrote to do this that you can do your own searches and it will pull the results to a file.

Site: http://sites.google.com/site/raptorbase/Home
Data File: CVSPharmacy.txt
Program: SwitchMiner_042409.zip .google sites wont let me upload an EXE, so I had to zip it.. sad

Mike

Good Job

Great job with mining the data. From a quick glance through the file it does need a lot of clean-up before it can be used as input. I notice there are several sections with sections of your script embedded - or at least there were when I imported it to Excel.

I also noted a lot of duplicates. Many were because in some cases the address was spelled differently as in Street and St.

Overall, it looks like a fun project for someone to take on and with the data provided it will cut many hours from the project.

--
ɐ‾nsǝɹ Just one click away from the end of the Internet

Good luck

Good luck with getting the data organized and a POI file created. I certainly will download the final product as it's always good to travel and know where to find a nationwide pharmacy just in case I leave my prescriptions at home ...

--
"Life is a journey - enjoy the ride!" Garmin nuvi 255

I'll work on that.

That looks like a fun job. I'll see if I can get this cleaned up. am I supposed to end up with a .csv or a gpx?

In this file there are a bunch of duplicate locations

for example one CVS has the pharmacy, the atm and the front store listed.

Can I delete the duplicates - or is this information that someone will want?+

The script that you see if

The script that you see if actually HTML code from the website itself. sometimes the code is written a little differently than I expect and my reader fails on it. If the site changes their html wording by as much as one character, it can render the whole reader useless....

Glad it will help... one problem though.

I have been told by other posters that we aren't supposed to post data collected in this way. I am trying to get this figured out with Miss POI and I'll let you know what I find out.

As a follow-up

I didn't state I would undertake the project to create a POI from the data. I stated it was a good job of mining the information and noted there were problems with the file. Nothing major, something that would take more time to resolve than I was willing to spend.

For someone wanting to try, you can import the data into Excel and then begin to clean it up. I recommend once you have the data imported, you sort on state and longitude. This will group stores together so you can look quickly for duplicate addresses.

There are over 5,000 lines in the file, so it will take some diligent work to go through it eliminating dupes. No need to reformat the file and if you use TurboCC's Extra POI Editor you can build both a CSV and GPX by specifying which columns have the data you need. The program will then put all the pieces in the right order for you - automagically.

--
ɐ‾nsǝɹ Just one click away from the end of the Internet