linux - How to delete the first subset of each set of column in a data file? -
i have data file more 40000 column. in header each column's name begins c1 , c2, ..., cn , each set of c has 1 or several subset example c1. has 2 subsets. need delete first column(subset) of each set of c. example if input looks :
input:
c1.20022 c1.31012 c2.44444 c2.87634 c2.22233 c3.00444 c3.44444 1 1 0 1 0 0 0 1 2 0 1 0 0 1 0 1 3 0 1 0 0 1 1 0 4 1 0 1 0 0 1 0 5 1 0 1 0 0 1 0 6 1 0 1 0 0 1 0
i need output like:
c1.31012 c2.87634 c2.22233 c3.44444 1 0 0 0 1 2 1 0 1 1 3 1 0 1 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 7 1 0 0 0
any suggestion please?
update: if there no space between digits in row (which th real situation of data set) should do? mean real data looks this: input:
c1.20022 c1.31012 c2.44444 c2.87634 c2.22233 c3.00444 c3.44444 1 1010001 2 0100101 3 0100110 4 1010010 5 1010010 6 1010010
and output:
c1.31012 c2.87634 c2.22233 c3.44444 1 0001 2 1011 3 1010 4 0000 5 0000 6 0000 7 1000
perl solution: first reads header line, uses regex extract column name before dot, , keeps list of column numbers keep. uses indices print wanted columns header , remaining lines.
#!/usr/bin/perl use warnings; use strict; use feature qw{ }; @header = split ' ', <>; $last = q(); @keep; $i (0 .. $#header) { ($prefix) = $header[$i] =~ /(.*)\./; if ($prefix eq $last) { push @keep, $i + 1; } $last = $prefix; } unshift @header, q(); join "\t", @header[@keep]; while (<>) { @columns = split; join "\t", @columns[@keep]; }
update:
#!/usr/bin/perl use warnings; use strict; use feature qw{ }; @header = split ' ', <>; $last = q(); @keep; $i (0 .. $#header) { ($prefix) = $header[$i] =~ /(.*)\./; if ($prefix eq $last) { push @keep, $i; } $last = $prefix; } join "\t", @header[@keep]; while (<>) { ($line_number, $all_digits) = split; @digits = split //, $all_digits; join "\t", $line_number, join q(), @digits[@keep]; }