Therefore, we will split it into several smaller data sets of K observations each, but the last smaller data set will have the number of observations equal to the remainder of the division N by K. Ideally, we would like to split a data set into K observations each, but it is not always possible to do as the quotient of dividing the number of observations in the original dataset N by K is not always going to be a whole number. There should be S smaller data sets of approximately same size.Each smaller data set should have maximum of K observations.
We will consider the following two sequential observation selection scenarios: Let’s say we need to split a data set SASHELP.CARS (number of observation N=428) into several smaller datasets. Splitting a data set into smaller data sets sequentially This blog post provides possible coding solutions for such scenarios. Alternatively, we might need to randomly select observations from a data set while splitting it into smaller tables. search results displayed by pages).įor instance, we might need to split a data set into smaller tables of K observations or less each or to split a data set into S equal (or approximately equal) pieces.Īlso, we might need to split a data set into sequentially selected subsets where the first K observations go into the first data set, the second K observations go into the second data set, and so on. Such an approach can be dictated by restrictions on the data set size imposed by hardware (memory size, transmission channel bandwidth etc.), processing time, or user interface convenience (e.g. In some cases, however, we need to split a large data set into many – not by a subsetting variable values, but by a number of observations in order to produce smaller, better manageable data sets.
For example, based on a value of variable REGION you may split a data set MARKETING into MARKETING_ASIA, MARKETING_AMERICA, MARKETING_EUROPE, and so on.
Sas version 9 data sets how to#
In his blog post, How to split one data set into many, Chris Hemedinger showed how to subset or split SAS data sets based on the values of categorical variables.