Removing extra characters in CSV files for use in spreadsheet software?

 

I am using the Reast Areas with Restrooms POI file, both on my Garmin, and also as a CSV file with the travel planner website Furkot.

In the CSV file, the lines have certain data enclosed in double quotes, brackets, and some separated by pipes:

-147.814902,61.800549,"AK,AK-1,EB/WB,MATANUSKA GLACIER STATE REC PARK REST AREA GLENN HWY,MM101","[RR,PT,,Pets,HF]|SCENIC VIEW"

I am sure these have to do with importing into GPS devices, but they mess up the display of columns in spreadsheet software (Apple Numbers and Google Sheets, in my case).

To my eye, the first double quote preceding AK, and the closing double quote after HWY, are unnecessary for a CSV file and must be some sort of formatting for GPS devices only.

Or perhaps some Excel-specific code?

The brackets I do not understand at all. Nor the pipes.

I’d love to be able to parse these CSVs as spreadsheets, so I can extract eastbound or westbound data only for use with Furkot.

I will probably have to build a few formulas to convert these, but thought I would check here to see if anyone had already done so or had any tips on the best way. I can already see there are a lot of variances in each entry, so covering all bases is going to be a bit of a process.

Standard Coding.

I believe the quotes is standard coding for CSV files that have text fields when the text itself contains commas.

--
Alan - Android Auto, DriveLuxe 51LMT-S, DriveLuxe 50LMTHD, Nuvi 3597LMTHD, Oregon 550T, Nuvi 855, Nuvi 755T, Lowrance Endura Sierra, Bosch Nyon

Double Quotes needed

alandb wrote:

I believe the quotes is standard coding for CSV files that have text fields when the text itself contains commas.

Alandb is correct.
Removing the Double Quotes will cause an extra field to appear on that line because you now have a comma that the GPS recognizes as a field delimiter.

If you delete the double quotes, you will have to delete the commas as well.

The POI files I create have no commas.

EDIT:
I just checked - the double quotes also protect imbedded line feeds or next line markers.

I suggest that you get Open Office (FREE) or MS Office, as those programs will handle the double quote issue with no problems.

--
Metricman DriveSmart 76 Williamsburg, VA

Assuming you've worked

Assuming you've worked through what others have suggested you can take "AK and change it to just AK with some simple keystrokes in Excel. Simply do the following:

Open the spreadsheet and select the "Home" tab

At the far right you will see an option captioned "Find and Select". Click that and select "Replace" from the drop down menu.

Lets take an example of getting rid of "AK and just replacing it with AK. In the "Find what" field put in "AK; in the "Replace with" put in just AK. Click "Replace All" and all the instances of "AK will become AK. Use some thought making your choices. As an example say I put in "AK in the "Find what" field and leave the "Replace with" field empty. Every instance of "AK will be removed.

--
John from PA

Yes understood that the

Yes understood that the double-quotes protects text that contains commas, but not sure why that would be desirable for the sections of each entry that do this.

Why would you want the EB/WB, the name of the rest area, and the mile marker all in one column?

Obviously this was done for a reason, but I don’t understand why. I want them to be in separate columns so I can sort the table by EB/WB. Bundling the data into the name column makes no sense to me.

But yes I understand what enclosing data in double quotes does, I am just trying to understand why it was done here. It doesn’t make sense to me to group those data points into one column.

Thanks

John from PA wrote:

Assuming you've worked through what others have suggested you can take "AK and change it to just AK with some simple keystrokes in Excel. Simply do the following:

Open the spreadsheet and select the "Home" tab

At the far right you will see an option captioned "Find and Select". Click that and select "Replace" from the drop down menu.

Lets take an example of getting rid of "AK and just replacing it with AK. In the "Find what" field put in "AK; in the "Replace with" put in just AK. Click "Replace All" and all the instances of "AK will become AK. Use some thought making your choices. As an example say I put in "AK in the "Find what" field and leave the "Replace with" field empty. Every instance of "AK will be removed.

Understood on how to do this for a single entry, but there are hundreds of them. Will require a more complex formula, and likely regex to catch all variations.

Mostly I am trying to understand why this file was built this way in the first place.

Clarification

metricman wrote:
alandb wrote:

I believe the quotes is standard coding for CSV files that have text fields when the text itself contains commas.

Alandb is correct.
Removing the Double Quotes will cause an extra field to appear on that line because you now have a comma that the GPS recognizes as a field delimiter.

If you delete the double quotes, you will have to delete the commas as well.

The POI files I create have no commas.

EDIT:
I just checked - the double quotes also protect imbedded line feeds or next line markers.

I suggest that you get Open Office (FREE) or MS Office, as those programs will handle the double quote issue with no problems.

I guess I didn’t explain myself properly. I don’t want those double-quote enclosed data sections to be treated as one column entry. They indeed display that way when I open the CSV in a spreadsheet app.

I want each of those comma-separated items to be in their own column.

I am trying to understand why it was built like this in the first place. What purpose does it serve to have all that data inaccessible in one grouped column entry?

The underlying problem is

The underlying problem is the CSV structure only allows a maximum of 4 fields or areas. In order to cram more information (data) into those fields you have to get creative with the use of formatting identifiers.

--
Illiterate? Write for free help.

Interesting

Box Car wrote:

The underlying problem is the CSV structure only allows a maximum of 4 fields or areas. In order to cram more information (data) into those fields you have to get creative with the use of formatting identifiers.

Aha, so this is a limitation of how GPS units read CSV data? Interesting.

Yes, and you really don't

Yes, and you really don't want to use Excel to make a global change like removing the quote marks. Use Notepad or Notepad++ and remove the quotes that way. Save the file THEN open it in Excel. From there you can remove the information you don't want.

The changes I have to make to the file to conform to my desired output are why I haven't updated the US Rest Area file in my devices in two years. The Canada Rest Area file? It would be nice if I could simply get a list from Canada, but the individual provinces and territories deal with that information, and, unless they changed things, make it virtually impossible to distill into something workable.

--
"Anyone who is capable of getting themselves made President should on no account be allowed to do the job." --Douglas Adams

QUOTES ??????

Not sure what file you are looking at. I maintain this file and I don't use Quotes in the file. I only use commas in column C. This allows the user of the file to sort on State,Hwy, Direction of travel,POI Name, and the last is MIle Marker if known. For column D I use brackets to enclose what the rest area has available [RR,PT,VM,Pets,HF]. If some of these are not available, I just use an extra comma. This keeps this listing consistent. Again, a user can sort on what is available. I use the Pipe in column D to separate any special info such as Scenic, RV dump etc.

NO SPECIAL CODE OR FORMULAS. Just the configuration I use for ease of updating.
Using a spread sheet a user can easily sort on the data for just about anything.

Column C
State
Hwy
Direction of travel
Name
Mile Marker if known

Column D
RR rest rooms
PT picnic tables
VM vending machines
Pets pets allowed
HF handicap facilities
Pipe Special such as Scenic views,RV Dump, No trucks,etc

Using the example you cited and copying that line item from my file, there are no quotations.
COLUMN C
AK,AK-1,EB/WB,MATANUSKA GLACIER STATE REC PARK REST AREA GLENN HWY,MM101
Using a spread sheet this gives 5 columns of data
COLUMN D
[RR,PT,,Pets,HF]|SCENIC VIEW
Using a spread sheet this gives 5 columns of data
The third column would be blank because this rest area does not have vending machines
After the pipe | just special information

AGAIN NO SPECIAL CODES OR FORMULAS

@Mahoney

Clearly I misspoke. There aren't any quote marks in Excel, but if you open up the CSV in Notepad or Notepad++ there will be quotes surrounding the third and fourth columns of data. This is a property of CSV itself and is not indicative of an error on your part. In order to properly edit the unwanted info out of them you have to remove the quotes so the data within is populated in separate columns when opened in Excel.

I have a lot of editing to do to the file in order to make it into something I feel is useful. That's on me however.

--
"Anyone who is capable of getting themselves made President should on no account be allowed to do the job." --Douglas Adams

@Strephon_Alkhalikoi

Strephon_Alkhalikoi wrote:

Clearly I misspoke. There aren't any quote marks in Excel, but if you open up the CSV in Notepad or Notepad++ there will be quotes surrounding the third and fourth columns of data. This is a property of CSV itself and is not indicative of an error on your part. In order to properly edit the unwanted info out of them you have to remove the quotes so the data within is populated in separate columns when opened in Excel.

I have a lot of editing to do to the file in order to make it into something I feel is useful. That's on me however.

Quotes are needed to keep extra columns from being fed to the GPS, as "commas" tell the GPS that that is the end of the data for that column. The double quotes tell the GPS to treat it as text, not a column delimiter.

Also, using ALT+Enter in Excel will give you a Hidden "Next Line" Symbol in the CSV file.

All Garmins will only process the first 4 columns in the file and discard the rest, if there are any.

It is best to edit the file in Excel and then save it the way you want it to display on your Garmin.

Be sure to save it as a "CSV (MS_DOS) (*.csv)" file when you do a "Save As".

--
Metricman DriveSmart 76 Williamsburg, VA

The main things I need to

The main things I need to remove in it are the state and highway locations, along with mile markers. Some of it is moving data from column D to C, notably the hours of operation for rest areas that have them. The rest of it is formatting. For example, changing the UPPERCASE entries to something that isn't shouting at me. The rest of it is changing the abbreviations for the amenities into something that can be seen at a glance. Not a knock on Mahoney as he inherited the file and retained the formatting, but the file as released isn't something that I believe fits that objective.

So when I say I have a lot of editing to do to make the file usable to me, I'm not kidding. As for manipulating the data, I strip the quotes using Notepad++ so everything is broken down into columns. I put the quotes back in before saving it as a CSV.

I'm well versed in manipulating the file as this isn't my first rodeo. However, in responding to Mahoney I unintentionally talked down to him. I know it's late but I apologize to him for that.

--
"Anyone who is capable of getting themselves made President should on no account be allowed to do the job." --Douglas Adams

One way around quotes

george22 wrote:

...But yes I understand what enclosing data in double quotes does, I am just trying to understand why it was done here. It doesn’t make sense to me to group those data points into one column.

I agree with you not putting all that disparate information in one column. I solved that issue by doing this:

-81.006280,41.740424,Debonne Vineyards,7840 Doty Rd•Madison OH 44057•440-466-3485

This gives me four column without quotes while maintaining readability.

Phil

--
"No misfortune is so bad that whining about it won't make it worse."