Entering tDAR Coding Sheets

Coding Sheets in tDAR

The information that allows tDAR to decode all of the coded values for a given column of your spreadsheet or database represents a "Coding Sheet" for that column. Each column that is a coded integer, real, or string needs to have a meaningful value associated with the code in order to make the spreadsheet useful.  Creating this association is the purpose of the coding sheet.  You will need a separate coding sheet for each coded column in the spreadsheet.  You use New in the main menu to create these coding sheets in tDAR; a coding sheet would usually reside within the same project that contains your database or spreadsheet. When you create a coding sheet in tDAR, you need to give it a title.  Use something that identifies your project and the variable being coded as when you need it you will only see the title.  For example:

Heshotauthla Fauna Coding Sheet - Taxon
Heshotauthla Ceramic Coding Sheet - Paste

There are two ways to get the coding sheet content into tDAR's coding sheet resource. As you create the coding sheet resources in tDAR, the Submit As box near the bottom of the form allows you to choose whether to upload an Excel (xls) or CSV file from your computer, or to enter (or cut and paste) the lines that represent the coding sheet into a text box on the coding sheet web entry form. Depending upon whether you choose to upload a file or enter the information in a text box, tDAR will allow you to browse to locate the file or will supply the text box to type (or cut and paste) into. Unless it is a very simple coding sheet you will want to maintain it on your computer rather than just type it in, in case you need to change it later.

In tDAR, a coding sheet has as code (e.g., 107), a value (e.g., "St. Johns Black-on-red"), and optionally a description (e.g., "matte, subglaze and glaze varieties included").  For a given coding sheet, if all of the codes are integers, they will be treated as numbers.  Otherwise codes are treated as strings. Thus, leading zeros on integer codes are ignored, so 001 and 01 and 1 are interpreted the same (and appear as 1). Codes can also be standardized string values, including strings of numbers that include decimal points.  Thus, Flr and Fil could be context codes and 01.1, 1.1, and 1.2.1 could be also all be codes that would be expressed as strings.  However, string codes must match exactly.  Thus, with a string code flr and Flr are different and 01.1 and 1.1 are different. The values and optional descriptions can have embedded spaces and can include most special characters (this is not true for ontologies, but never mind that for now). The order of the lines (and hence the values) within the coding sheet doesn't matter. The codes need not be in numerical or alphabetical order.

If you are creating a coding sheet from scratch or have one already in Excel format, the easiest thing to do is to create an Excel file for each coding sheet.  Each Excel file will have only these two or three columns, in this order (code, value, description), no headings, and nothing after the coding sheet information.  Save these as xls files (not as Excel 2007 xlsx files).  You can then upload these files directly into tDAR coding sheets.

Alternatively you can create a text file in csv (comma separated value) format.  You can do this from scratch or can edit the information in an existing coding key document into this form.

In a tDAR csv format coding sheet each line has a code followed by a comma, followed by is value, followed optionally by a description (e.g. for how this value is distinguished). For example,

1,Male
2,Female
9,Indeterminate,No sex distinctive characteristics

Is a perfectly good coding sheet consisting for three line for the column SEX.  The following also works:

M,Male
F,Female
?,Indeterminate,No sex distinctive characteristics

The values and optional descriptions in a csv file cannot have double quotes within them (if you must, they must appear as two quotes in a row, i.e.,"") and if they include a comma then the whole value needs to be enclosed in double quotes.

Converting a word processed or pdf coding key into a set of tDAR CSV coding sheets

If you have a coding key already typed up, you will want to convert that into csv format to avoid tedious retyping. The goal is to get from the text of a formatted document that describes a coding system into tDAR coding sheets.

Coding Key in a Spreadsheet

If your coding key sheet was created in Excel or another spreadsheet, you need to arrange it so that the codes are in the first column (A), their values in the second (B), and more elaborate descriptions are optionally in the third column. There should be nothing to the right of the third column. If you have multiple coding sheets in one spreadsheet you can select the rows that correspond to your coding sheet and say "Save As" to save a file with just those rows.  If you are using Excel, you can just save the file as a normal Excel xls file in an appropriate folder on your computer.  tDAR will read these, expecting the information in two or three columns.  If you aren't using Excel (this also works if you are) you should Save As and select "CSV" format and name and save the file. When you Save As into CSV in Excel you will get a message about losing features, just say OK.

If your coding key is an Adobe Acrobat (pdf) or word processed document, conversion is a bit more work but for a long coding key it will probably be worth it. You can't save a either a word processed (e.g. Word) or pdf file directly to CSV.

Coding Key in a Word Processor Document

What you want to do with a word processed file is open the document and use the word processor's search and replace function to help edit the document into a series of sets of lines each of which represents a coding sheet in CSV format. For example, you will want to convert tabs to spaces or commas (depending on where they are, convert multiple spaces to single spaces, eliminate leading spaces, and eliminate text within the lines of a specific coding key that should not be part of the key. Note that Word and most word processors have means to replace formatting characters such as tabs (in Word 2007, select replace and then click on the Find tab of teh window that opens. At the bottom of this window click on "More" and then click on Format to insert a special charcetr into the find box). You then click on the replace tab and enter the replacement text.  You then use Save As to save the modified coding key document in a text format. When you Save AS you must first selected Plain text (.txt) in the dropdown box of formats in Word (or perhaps ASCII format in other word processors). If Word asks, "Windows Default" is OK.

Next you will want to open the plain file in a text editor and fix any problems. Things may show up in the text editor that you don't see in the word processor's view. For Windows machines the supplied text editor is Notepad which is found in the Accessories folder of All Programs. (There are a other much better ones free on the web. One that I have used is http://download.jgsoft.com/editpad/SetupEditPadLite.exe.) Once the document is in the proper format, be sure to save it (with teh default txt extension is fine).

If your text file includes multiple coding sheets (i.e. for multiple columns) then it is easiest to cut and paste each subset into the appropriate tDAR coding sheet.To do this, you create the coding sheet in tDAR, name it, and then where it says Submit As select manual entry into a text box.  Then a text box will open at the bottom of the form.  You can then go back to your text document, select the lines corresponding to this particular coding sheet, copy them, return to the tDAR form and paste them into the text box.   Then submit the coding sheet. 

Alternately you can save each set of lines in the text document as a separate file.  The easiest way to do this is to open another instance of Notepad with a blank document and cut or copy the  lines for a single coding sheet to the blank Notepad document.  Then, for the new document,  you will want to Save As a file name with a CSV extension, such as "species codes.csv". In this case, on the tDQAR web form, you use the upload rather than manual entry into a text box option in tDAR. You then browse to the file on your computer, select it, and submit the tDAR form. 

To convert a pdf document (that has been character recognized), the process is similar. You will want to select all the text in the document, then open a text editor such as Notepad (see above) and paste the text into it. You can then use search and replace and edit the document into the proper format as you would have done, above, with your word processor.

Once You have Submitted the Coding Sheet

Be sure to look at the result. If you had a comma inside a value, you will see that the comma is gone and the text after the comma now appears as the description. You will need to go back and enclose the value in double quotes or eliminate the comma.  Remember of there are any characters in the codes other than numbers, it will treat the the all the codes in that spreadsheet as strings and the match must be exact.  If a code seems to have disappeared, your coding sheet probably didn't have a comma after the value. 

Note for more Geeky Types

If you have a lot of consistently lines that need to be converted, you may want to find a fancier text editor that deals with regular expressions that will allow MUCH more complex search and replace operations. I use TextPad ($16.50) that I like a lot http://www.textpad.com. For example, you can do a single search and replace operation that would convert lines of the form:

Bunny (sylvilagus)      010
Jackrabbit (lepus)       020

To a proper coding sheet form:

10,Sylvilagus sp. (bunny)
20,Lepus sp. (jackrabbit)

However, there is a learning curve here.