r - Finding a sensible range -
i'm struggling few days. 3rd question @ stackoverflow same topic, hope time question better defined.
my data distributed this: (histogram)
the x-axis correspond range of probabilities: 0 1.
i want assign states state 1 state 10 sensibly probability range.
this have got:
interval <- round(quantile(datag, c(seq(0,1,by=0.10))),3)
output:
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0.000 0.008 0.015 0.024 0.036 0.054 0.080 0.124 0.209 0.397 1.000
assign states 0 10:
states <- data.frame(datag, state=findinterval(datag, interval)) head(states)
output: states
probability state 0.20585012 8 0.21202839 9 0.07087725 6 0.7109513 10 0.9641807 10
the problem this: can see above, have state 9 probability 0.2120 , state 10 > 0.710. happy prob=0.2120 state 4 , prob=0.710 state 7 , prob=0.96 = state 10.
so how assign states more uniformly?
to replicate datag:
datag <- data.frame(probability=rgamma(10000, shape=0.6, rate=4.8, scale=1/4.8))
edit: @roman:
datag <- subset(datag, probability<=1)
edit: @simon
yes, i'm aware of "cut":
table(cut(datag, breaks = c(seq(0,0.8,by=0.1))))
output:
(0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8] 125545 26625 12795 8126 5556 4108 3227 2606
how 1 define breaks? after intervals (breaks themselfs) can assign states corresponding interval probability falls in.
you've got answer in op! don't take wrong way, think need spend more time reading documentation ?cut
! if set labels = false
in cut
integer codes each break corresponds to.
# set seed true reproducibility! set.seed(1) datag <- data.frame(probability=rgamma(10000, shape=0.6, rate=4.8, scale=1/4.8)) int <- cut( datag$probability , breaks = seq(0 , 1 , = 0.1 ) , lab = false ) head( cbind( prob = datag$probability , int ) ) prob int [1,] 0.031860645 1 [2,] 0.455054687 5 [3,] 0.134175238 2 [4,] 0.058957301 1 [5,] 0.855493999 9 [6,] 0.009144936 1
Comments
Post a Comment