Most efficient format for array data for R import?

I'm in the enviable position of being able to set up the format for my data collection ahead of time, rather than being handed some crazy format and having to struggle with it. I'd like to make sure I'm setting it up in a way that minimizes headaches down the road, but I'm not very familiar with importing into multidimensional arrays so I'd like input. It also seems like a thought exercise that others might get some use from.

I am compiling a large number of data summaries (500+) with 23 single data values for each experiment and two additional vectors that vary between 100 and 1500 data values (these two vectors happen to always match in length for each sample, but their length is different for each sample). I'm having to store all of these in an Excel sheet which I'm currently building. I want to set it up in a way that efficiently stores this data for import into an R array.

I'm assuming that the longer dimensions, which vary in length, will have the max length (1500) and a bunch of NA's at the end rather than try to keep track of ragged data in Excel.

My current plan would be to store these in long form in Excel, with data labels in the first column (dim1, dim2,...), and the data summaries in each subsequent column (a, b, c...), since this saves the most space. Using a smaller number of dimensions as an example (7 single values, 2 vectors of length 1500), the data would look like this in Excel:

     a b c...
dim1 2 5 7...
dim2 3 6 8...
dim3 6 8 2 ...
dim4 5 6 1... 
dim5 6 2 1...
dim6 0 3 8...
dim7 8 5 4...
dim8 1 1 1...
dim8 2 2 2 ...
... continued x1500
dim9 4 4 4...
dim9 5 5 5 ...
...continued x1500

Can I easily import this, using the leftmost column to identify the dimensions of the array in long form? I don't see an easy way to do this using Reshape2, but perhaps I'm missing something. Or, do I need to have the data in paired columns?

It isn't clear to me whether this format is the most efficient way to organize this data for import into a multidimensional array, or if there is a better way. Eventually there will be a large number of samples so I'd like to think through this now rather than struggle later.

What is the most painless way to import this...or, is there a more efficient way of setting it up for easier import?


Hmm.. I can't think of a case that you would have to use melt . If you keep the current format, and add a heading to the 'dim' column then you should be able to work with that data fairly easily.

If you did transpose the data on 'dim' I think it would make things a lot more difficult.

It might good to know what variable types a,b,c,etc. are in order to make a better assessment.

链接地址: http://www.djcxy.com/p/38324.html

上一篇: 从r中的excel导入小数坐标

下一篇: 用于R导入的阵列数据的最有效格式?