bioinformatics - subroutine in perl to perform over #ARGV entries -

i've got small programme processes lists of blast hits, , checks see if there's overlap between blast results iterating blast results (as hash key) through hashes containing each blast list.

this involves processing each blast input file $argv in same way. depending on i'm trying achieve, might want compare 2, 3 or 4 blast lists gene overlap. want know how can write basic processing block subroutine can iterate on many $argv arguments exist.

for example, below works fine if input 2 blast lists:

#!/usr/bin/perl -w use strict;  use file::slurp; use data::dumper; $data::dumper::sortkeys = 1;  if ($#argv != 1){     die "usage: intersect.pl <de gene list 1><de gene list 2>\n" }     $input1 = $argv[0]; open $blast1, '<', $input1 or die $!;  $results1 = 0; (@blast1id, @blast1_info,  @split); while (<$blast1>) {     chomp;     @split = split('\t');     push @blast1_info, $split[0];     push @blast1id, $split[2];     $results1++; }    print "$results1 blast hits in $input1\n";  %blast1; push @{$blast1{$blast1id[$_]} }, [ $blast1_info[$_] ] 0 .. $#blast1id; #print dumper (\%blast1);  $input2 = $argv[1]; open $blast2, '<', $input2 or die $!;  $results2 = 0; (@blast2id, @blast2_info); while (<$blast2>) {     chomp;     @split = split('\t');     push @blast2_info, $split[0];     push @blast2id, $split[2];     $results2++; }     %blast2; push @{$blast2{$blast2id[$_]} }, [ $blast2_info[$_] ] 0 .. $#blast2id; #print dumper (\%blast2);  print "$results2 blast hits in $input2\n";

but able adjust cater 3 or 4 blast lists inputs. imagine sub routine work best this, invoked each input, , might this:

sub process {  $input$i = $argv[$i-1]; open $blast$i, '<', $input[$i] or die $!;  $results$i = 0; (@blast$iid, @blast$i_info,  @split); while (<$blast$i>) {     chomp;     @split = split('\t');     push @blast$i_info, $split[0];     push @blast$iid, $split[2];     $results$i++; }    print "$results$i blast hits in $input$i\n";  print dumper (\@blast$i_info); print dumper (\@blast$iid);  # call sub 'process every argv...  &process 0 .. $#argv;

update:

i've removed hash part last snippet.

the resultant data structure should be:

4 blast hits in <$input$i>

$var1 = [           'tcons_00001332(xloc_000827),_4.60257:9.53943,_change:1.05146,_p:0.03605,_q:0.998852',           'tcons_00001348(xloc_000833),_0.569771:6.50403,_change:3.51288,_p:0.0331,_q:0.998852',           'tcons_00001355(xloc_000837),_10.8634:24.3785,_change:1.16613,_p:0.001,_q:0.998852',           'tcons_00002204(xloc_001374),_0.316322:5.32111,_change:4.07226,_p:0.00485,_q:0.998852', ];    $var1 = [           'gi|50418055|gb|bc078036.1|_xenopus_laevis_cdna_clone_mgc:82763_image:5156829,_complete_cds',           'gi|283799550|emb|fn550108.1|_xenopus_(silurana)_tropicalis_mrna_for_alpha-2,3-sialyltransferase_st3gal_v_(st3gal5_gene)',           'gi|147903202|ref|nm_001097651.1|_xenopus_laevis_forkhead_box_i4,_gene_1_(foxi4.1),_mrna',           'gi|2598062|emb|aj001730.1|_xenopus_laevis_mrna_for_xsox17-alpha_protein', ];

and input:

tcons_00001332(xloc_000827),_4.60257:9.53943,_change:1.05146,_p:0.03605,_q:0.998852     0.0     gi|50418055|gb|bc078036.1|_xenopus_laevis_cdna_clone_mgc:82763_image:5156829,_complete_cds tcons_00001348(xloc_000833),_0.569771:6.50403,_change:3.51288,_p:0.0331,_q:0.998852     0.0     gi|283799550|emb|fn550108.1|_xenopus_(silurana)_tropicalis_mrna_for_alpha-2,3-sialyltransferase_st3gal_v_(st3gal5_gene) tcons_00001355(xloc_000837),_10.8634:24.3785,_change:1.16613,_p:0.001,_q:0.998852       0.0     gi|147903202|ref|nm_001097651.1|_xenopus_laevis_forkhead_box_i4,_gene_1_(foxi4.1),_mrna tcons_00002204(xloc_001374),_0.316322:5.32111,_change:4.07226,_p:0.00485,_q:0.998852    0.0     gi|2598062|emb|aj001730.1|_xenopus_laevis_mrna_for_xsox17-alpha_protein

you can't inject variable value in middle of variable name. (well, you can shouldn't. , can't use array indexing in middle of name.)

these names aren't valid:

@blast[$i]_info @blast[$i]_id

you need move index end:

@blast_info[$i] @blast_id[$i]

that said, i'd rid of arrays , use hash instead.

your second code snippet doesn't show call subroutine. unless it's explicitly called never run , program nothing. i'd modify process sub take single argument , call each element of @argv. e.g.

process($_) foreach @argv;

here's how i'd write program:

use strict; use warnings; use data::dumper;  @blast; push @blast, process($_) foreach @argv;  print dumper(\@blast);  sub process {     $file = shift;      open $fh, '<', $file or die "can't read file '$file' [$!]\n";      %data;      while (<$fh>) {         chomp;         ($id, undef, $info) = split '\t';         $data{$id} = $info;     }      return \%data; }

it isn't quite clear resulting data structure should like. (i took best guess.) recommend reading perlreftut gain better basic understanding of references , using them build data structures in perl.

Search This Blog

Copy

bioinformatics - subroutine in perl to perform over #ARGV entries -

Comments

Post a Comment

Popular posts from this blog

asp.net - redirect .aspx with query string to html page using htaccess -

matlab - Deleting rows with specific rules -

php - MySQLi multi_query results for later use -