bioinformatics - subroutine in perl to perform over #ARGV entries -
i've got small programme processes lists of blast hits, , checks see if there's overlap between blast results iterating blast results (as hash key) through hashes containing each blast list.
this involves processing each blast input file $argv in same way. depending on i'm trying achieve, might want compare 2, 3 or 4 blast lists gene overlap. want know how can write basic processing block subroutine can iterate on many $argv arguments exist.
for example, below works fine if input 2 blast lists:
#!/usr/bin/perl -w use strict; use file::slurp; use data::dumper; $data::dumper::sortkeys = 1; if ($#argv != 1){ die "usage: intersect.pl <de gene list 1><de gene list 2>\n" } $input1 = $argv[0]; open $blast1, '<', $input1 or die $!; $results1 = 0; (@blast1id, @blast1_info, @split); while (<$blast1>) { chomp; @split = split('\t'); push @blast1_info, $split[0]; push @blast1id, $split[2]; $results1++; } print "$results1 blast hits in $input1\n"; %blast1; push @{$blast1{$blast1id[$_]} }, [ $blast1_info[$_] ] 0 .. $#blast1id; #print dumper (\%blast1); $input2 = $argv[1]; open $blast2, '<', $input2 or die $!; $results2 = 0; (@blast2id, @blast2_info); while (<$blast2>) { chomp; @split = split('\t'); push @blast2_info, $split[0]; push @blast2id, $split[2]; $results2++; } %blast2; push @{$blast2{$blast2id[$_]} }, [ $blast2_info[$_] ] 0 .. $#blast2id; #print dumper (\%blast2); print "$results2 blast hits in $input2\n"; but able adjust cater 3 or 4 blast lists inputs. imagine sub routine work best this, invoked each input, , might this:
sub process { $input$i = $argv[$i-1]; open $blast$i, '<', $input[$i] or die $!; $results$i = 0; (@blast$iid, @blast$i_info, @split); while (<$blast$i>) { chomp; @split = split('\t'); push @blast$i_info, $split[0]; push @blast$iid, $split[2]; $results$i++; } print "$results$i blast hits in $input$i\n"; print dumper (\@blast$i_info); print dumper (\@blast$iid); # call sub 'process every argv... &process 0 .. $#argv; update:
i've removed hash part last snippet.
the resultant data structure should be:
4 blast hits in <$input$i>
$var1 = [ 'tcons_00001332(xloc_000827),_4.60257:9.53943,_change:1.05146,_p:0.03605,_q:0.998852', 'tcons_00001348(xloc_000833),_0.569771:6.50403,_change:3.51288,_p:0.0331,_q:0.998852', 'tcons_00001355(xloc_000837),_10.8634:24.3785,_change:1.16613,_p:0.001,_q:0.998852', 'tcons_00002204(xloc_001374),_0.316322:5.32111,_change:4.07226,_p:0.00485,_q:0.998852', ]; $var1 = [ 'gi|50418055|gb|bc078036.1|_xenopus_laevis_cdna_clone_mgc:82763_image:5156829,_complete_cds', 'gi|283799550|emb|fn550108.1|_xenopus_(silurana)_tropicalis_mrna_for_alpha-2,3-sialyltransferase_st3gal_v_(st3gal5_gene)', 'gi|147903202|ref|nm_001097651.1|_xenopus_laevis_forkhead_box_i4,_gene_1_(foxi4.1),_mrna', 'gi|2598062|emb|aj001730.1|_xenopus_laevis_mrna_for_xsox17-alpha_protein', ]; and input:
tcons_00001332(xloc_000827),_4.60257:9.53943,_change:1.05146,_p:0.03605,_q:0.998852 0.0 gi|50418055|gb|bc078036.1|_xenopus_laevis_cdna_clone_mgc:82763_image:5156829,_complete_cds tcons_00001348(xloc_000833),_0.569771:6.50403,_change:3.51288,_p:0.0331,_q:0.998852 0.0 gi|283799550|emb|fn550108.1|_xenopus_(silurana)_tropicalis_mrna_for_alpha-2,3-sialyltransferase_st3gal_v_(st3gal5_gene) tcons_00001355(xloc_000837),_10.8634:24.3785,_change:1.16613,_p:0.001,_q:0.998852 0.0 gi|147903202|ref|nm_001097651.1|_xenopus_laevis_forkhead_box_i4,_gene_1_(foxi4.1),_mrna tcons_00002204(xloc_001374),_0.316322:5.32111,_change:4.07226,_p:0.00485,_q:0.998852 0.0 gi|2598062|emb|aj001730.1|_xenopus_laevis_mrna_for_xsox17-alpha_protein
you can't inject variable value in middle of variable name. (well, you can shouldn't. , can't use array indexing in middle of name.)
these names aren't valid:
@blast[$i]_info @blast[$i]_id you need move index end:
@blast_info[$i] @blast_id[$i] that said, i'd rid of arrays , use hash instead.
your second code snippet doesn't show call subroutine. unless it's explicitly called never run , program nothing. i'd modify process sub take single argument , call each element of @argv. e.g.
process($_) foreach @argv; here's how i'd write program:
use strict; use warnings; use data::dumper; @blast; push @blast, process($_) foreach @argv; print dumper(\@blast); sub process { $file = shift; open $fh, '<', $file or die "can't read file '$file' [$!]\n"; %data; while (<$fh>) { chomp; ($id, undef, $info) = split '\t'; $data{$id} = $info; } return \%data; } it isn't quite clear resulting data structure should like. (i took best guess.) recommend reading perlreftut gain better basic understanding of references , using them build data structures in perl.
Comments
Post a Comment