Sabtu, 20 Juni 2009

Learning to Solve Stochastic Shortest Path Problems


Learning to Solve
Stochastic Shortest Path Problems


Mohsen Jamali
Semantic Web Research Laboratory,
Computer Engineering Department,
Sharif University Of Technology, Tehran, Iran
m jamali@ce.sharif.edu


Abstract.

Di?erent classes of Stochastic Shortest Path (SSP) problems
are considered in literatue. In this paper we’ll consider a special case
in which edges’ costs have probabilistic distribution. We try to learn the
expectes cost for each cost and so we learn the shortest path from a node
v s to a destination node v d. Our algorithm is somehow like Dijkstra’s
algorithm, but it just uses local knowledges.
1 Introduction
The class of Stochastic Shortest Path (SSP) problems is a subset of Markov
Decision Processes (MDPs) that is of central importance to AI: they are the
natural generalization of the classic search model to the case of stochastic tran-
sitions and general cost functions. SSPs had been recently used to model a broad
range of problems going from robot navigation and control of non-deterministic
systems to stochastic game-playing and planning under uncertainty and partial
information. [1]
The theory of MDPs had received great attention from the AI community for
three important reasons. First, it provides an easy framework for modeling com-
plex real-life problems that have large state-space (even in?nite) and complex dy-
namics and cost functions. Second, MDPs provide mathematical foundation for
independently-developed learning algorithms in Reinforcement Learning. And
third, general and e?cient algorithms for solving MDPs had been developed,
the most important being Value Iteration and Policy Iteration.
As the name suggests, an SSP problem is an MDP problem that has positive
costs and an absorbing goal state. A solution for an SSP is a strategy that leads
to the goal with minimum expected cost from any other state. Quite often, we
are only interested in how to get to the goal from a ?xed initial state instead
of knowing the general solution; the reason being that the state space usually
contains many states that are irrelevant.
2 Problem De?nition
A stochastic graph G is de?ned by triple G = (V,E, F) where V = {1, 2, . . . , n} is
set of nodes, E subset V ×V , and n×n matrix F is the probability distribution

2
describing the statistics of edge costs. In particular, cost C ij of edge (i, j) is
assumed to be a random variable with f ij as it probability density function,
which is assumed to be unknown. It is assumed that distribution f ij is not known
a priori. In stochastic graph G, a path p i with length of n i nodes and expected
cost of L p i from source node v s to destination node v d is de?ned as an ordering
{p i,1, p i,2, . . . , p i,n i } ? V in such a way that v p i,1 = v s and v p i,ni = v d are
source and destination nodes, respectively and (p i,j, p i,j+1) ? E for 1 = j < n i,
th
where p i,j is the j node in the path p i. Assume that there are r distinct paths
? = {p 1, p 2, . . . , p r} between v s and v d. The shortest path between source node
v s and destination node v d denoted by v s v d is de?ned as a path with minimum
*
expected cost. In other word, the shortest path p has cost of L p *= min p??L p.
In our problem we want to ?nd such shortest paths.
3 Related Works
Di?erent classes of stochastic optimal path problems are considered in the liter-
ature. The ?rst work known in this area is due to Frank [2] where the shortest
optimal path over a probabilistic graph is determined. The most common crite-
rion to determine the optimal path is the one that maximizes the expected value
of an utility function. This criterion stems from the formulation presented by
Von-Newman-Morgenstern where evaluations should be made under uncertainty
[3]. Algorithms for the linear and quadratic utility functions have been presented
by Mirchandini and Soroush [3], Murthy and Sarkar [4, 5] and more recently by
Deolinda Dias Rasteiro, Antonio Batel Anjo anf Helena Ramalhinho Lourenco
[6].
3.1 Solving Stochastic Shortest-Path Problems with RTDP
Formally, an MDP is de?ned by :
– A ?nite state space S = {1, . . . , n}
– A ?nite set of controls U(i) for each state i ? S,
– Transition probabilities p(i, u, j) for all u ? U(i) that are equal to the prob-
ability of the next state being j after applying control u in state i,
– A cost g(i, u) associated to u ? U(i) and i ? S [7]
Blai Bonet and Hector Ge?ner [7] used Real-Time Dynamic Programming (RTDP)
to solve SSP. Their problem de?nition is somehow di?erent from our prob-
lem, but generally, they also solve SSP problems. In their problem, A sto-
chastic shortest-path problem is an MDP problem in which the state space
S = {1, . . . , n, t} is such that t is a goal (target) state that is absorbing (i.e.,
p(t, u, t) = 1 and g(t, u) = 0 for all u ? U(t)), and the discount factor a = 1. In
this case, the existence of optimal policies (and optimal stationary policies) is a
major mathematical problem.
Real-Time Dynamic Programming (RTDP) [8] is an algorithm for ?nding short-
est paths. However, RTDP is a probabilistic algorithm that only converges symp-
totically. Hence, although there has been experimental results showing that

3
RTDP converges faster than other algorithms, it cannot be used as an o?-line
algorithm.
RTDP is a probabilistic algorithm that computes a partial optimal policy by
performing successive walks (also called trials) on the state space. Each trial
starts at the initial state 1 and ?nishes at the goal state t. At all times k, the
*
RTDP algorithm maintains an approximation J k to J that is used to greedly
select a control u k to apply in the current state x k. Initially, J 0 is implicitly
stored as an heuristic function h(.). Then, every time a control u k is selected in
state x k, a new approximation J k+1 is computed as J k+1(x) = J k(x) if x= x k,
and n
J k+1(x k) = g(x k, u k) p(x k, u k, i)J k(i)
i=1
3.2 Stochastic Shortest Path Problems with Piecewise-Linear
Concave Utility Functions
Ishwar Murthy and Sumit Sarkar [5] considered a stochastic shortest path prob-
lem where the arc lengths are independent random variables following a normal
distribution. In this problem, the optimal path is one that maximizes the ex-
pected utility, with the utility function being piecewise-linear and concave. Such
a utility function can be used to approximate nonlinear utility functions that
capture risk adverse behavior for a wide class of problems. The principal con-
tribution of their work is the development of exact algorithms to solve large
problem instances. Two algorithms are developed and incorporated in labelling
procedures. Computational testing is done to evaluate the performance of the
algorithms. Overall, both algorithms are very e?ective in solving large problems
quickly. The relative performance of the two algorithms is found to depend on
the ”curvature” of the piecewise linear utility function.
3.3 Shortest Paths’ Probability Distribution in Probabilistic
Graphs
H. Frank [2] also considers the problem of ?nding shortest-path probability dis-
tribution in graphs whose branches are weighted with random lengths, examines
the consequences of various assumptions concerning the nature of the available
statistical information, and gives an exact method for computing the probabil-
ity distribution, as well as methods based on hypothesis testing and statistical
estimation.
4 Our Proposed Algorithm
In our problem, we do not have the probability distributions for edges’ cost. If we
had these distributions the solution was trivial, we just calculated the expected
value of each edge’s cost from its probability distribution, and then we used
Dijkstra’s algorithm to ?nd the shortest path. The algorithm we propose tries
to solve the problem de?ned in section 2 by learning about estimations of each

4
edge’s cost. Our algorithm learns these estimations iteratively. Each node in the
graph has edges to its neighbors, except for the destination node. Each edge’s
cost has its own probability distribution. Each node also has estimations of the
shortest path’s length(cost) to the destination node which should be learned an
updated, and also the node learns the estimated cost of its edges to the neighbors.
In our algorithm a walker starts to traverse the graph from the source node to the
destination node. The walker, at each node n selects the edge (n,m) which leads
to the smallest estimated cost from node n to the destination node according to
node’s knowledge about costs’ estimate in this step.
*
m = arg min {C + est } (1)
nj j
j?V,(n,j)?E
*
where C is the estimated(learned) cost about the edge (n, j) and est is the
nj j
estimation learned about the least cost to reach destination node from node j.
V and E are respectively the set of nodes and edge as de?ned in section 2. We
use - Greedy algorithm so that in each node there is probability to select a
random node in neighborhood. The probability of randomly selecting a neighbor
decreases in each step as cost estimations are learned.
While traversing the graph, each edge’s estimated cost will be updated and
learned. When passing an edge we would have a cost with a probability distrib-
ution assigned to it. So we update our estimation about edge’s cost as following
*
k × C (k) + C ij
* ij
C (k + 1) = (2)
ij
k + 1
* th
C (k) represents our estimation for the cost of edge (i, j) in k step. C is
ij ij
the cost paid after traversing the edge (i, j) which has probabilistic distribution.
Actually we replace our estimated cost with the weighted mean of the previous
estimated cost and the newly paid cost.
The walker, ?nally, reaches the destination node (we do not let it to traverse
visited nodes to avoid falling in loops, so we always reach the ?nal destination).
When the walker arrives to the destination node we update the estimated cost
for each node to reach the destination node:
*
est i = min C ij+ est j (3)
(i,j)?E
We continue iterations until there is no change or the mean amount of change is
less than a constant.
5 Experimental Results
In order to evaluate the performance of our algorithm, we simulated our algo-
rithm on the following four stochastic graphs.
– Graph 1 which is shown in ?gure 1 is a graph with 10 nodes and 23 edges,
v s = 1, and v d = 10. The cost of each edge is a random variable with
exponential distribution. The label given to each edge of graph given in
?gure 1 is the mean of its exponential distribution.

5
– Graph 2 is a graph with 10 nodes, 23 edges, v s= 1, and v d= 10. Edge cost
distribution is given in table 1.
– Graph 3 is a graph with 10 nodes, 23 edges, v s= 1, and v d= 10. Edge cost
distribution is given in table 2.
– Graph 2 is a graph with 15 nodes, 42 edges, v s= 1, and v d= 15. Edge cost
distribution is given in table 3.
Fig. 1. Graph 1 to be used in experiments
We evaluate each run of the algorithm on the above graphs by comparing
the result with the results using Dijkstra’s algorithm. In fact the best estimate
for the shortest path algorithm is the Dijkstra’s algorithm, but we assume that
we don not have the distributions and try to learn the expected costs by local
knowledges. In our experiments we assigned a default number (10) to all esti-
mations (estimations for edge’s cost and estimations for a node’s cost to reach
the destination node). The number of iterations for each run is set to 1000.
We ran the algorithm 10 times for each Graph. Table 4 shows the results of our
experiments. Each row of the table shows the results of each graph. Columns
contain paths identi?ed by our Algorithm and Dijkstra’s algorithm. Also there
are two columns for residuals. The values in colums are mean of the values in 10
runs of the algorithm. Edge Cost residual is the error in ?nding the edges’ costs * 2
(C - C ij)
(i,j)?E ij
Edge Cost Residual = (4)
|E|
*
where C is our learned cost for the edge (i, j) and C is the expected value of
ij ij
the edge’s cost according to its distribution. Path Length Residual is the error

6
Edge Lengths(Costs) Probabilities
( 1 , 2 ) 7 7.3 9.4 0.2 0.5 0.3
( 1 , 3 ) 2.5 3.5 8.2 0.5 0.4 0.1
( 1 , 4 ) 4.2 4.8 6.1 0.2 0.3 0.5
( 2 , 5 ) 2.6 3.1 5.5 8.8 9 0.1 0.2 0.4 0.2 0.1
( 2 , 6 ) 5.8 7 9.5 0.3 0.3 0.4
( 3 , 2 ) 1.5 7.3 0.4 0.6
( 3 , 7 ) 6.5 7.4 7.5 0.4 0.5 0.1
( 3 , 8 ) 5.9 7.2 9.8 0.6 0.3 0.1
( 4 , 3 ) 2.1 3.2 8.5 9.8 0.3 0.2 0.3 0.2
( 4 , 9 ) 8.9 9.6 0.7 0.3
( 5 , 7 ) 3.2 4.8 6.7 0.2 0.2 0.6
( 8 , 4 ) 9 9.6 0.5 0.5
( 5 , 10 ) 6.3 6.9 0.5 0.5
( 6 , 5 ) 0.6 1.5 3.9 5.8 0.1 0.4 0.3 0.2
( 7 , 8 ) 1.6 1.8 4 5.2 0.2 0.3 0.3 0.2
( 8 , 9 ) 1.7 4.9 5.3 6.5 0.1 0.4 0.4 0.1
( 9 , 10 ) 0.6 1.2 5.4 6.6 0.1 0.1 0.3 0.5
( 7 , 6 ) 6.1 6.3 8.5 0.2 0.3 0.5
( 6 , 3 ) 6.5 8.5 9.8 0.8 0.1 0.1
( 7 , 10 ) 1.6 3.4 7.1 0.1 0.5 0.4
( 8 , 7 ) 2.1 4.6 8.5 0.3 0.4 0.3
( 7 , 9 ) 0.3 0.3 5 0.1 0.1 0.5
Table 1. Weight Distribution of Graph 2 with v s= 1 and v d= 10
in ?nding the cost to reach destination node from di?erent start nodes: * 2
(est - est i)
i?V i
Path Length Residual = (5)
|V |
*
est is our learned cost for the shortest path from node i to the destination
i
v d, and est i is the expected cost for the shortest path computed by Dijkstra’s
algorithm.
Comparing the results shows that our algorithm’s results are very similar to the
Dijkstra’s algorithm. The shortest path identi?ed by our algorithm are exactly
the same as paths identi?ed by Dijkstra’s algorithm. So our algorithm learns
well and results are satisfying (Precisions for ?nding the shortest path is 100%,
and the errors are usually low for edges’ expected costs).
Figure 2 shows the rate of convergence of our algorithm to the results of Dijk-
stra’s algorithm in di?erent iteratios. After each iteration we compute the ratio
of our algorithm’s estimation for the shortest path to the Dijkstra’s results (
est v s / est v s). Figure 2 shows just the rate for the ?st 100 iterations. After the
100the iteration the variance of the rate is nearly 0 and we didn’t show the rest
of the Diagram.

7
Edge Lengths(Costs) Probabilities
( 1 , 2 ) 3 5.3 7.4 9.4 0.2 0.2 0.3 0.2
( 1 , 3 ) 3.5 6.2 7.9 8.5 0.3 0.3 0.2 0.2
( 1 , 4 ) 4.2 6.1 6.9 8.9 0.2 0.3 0.2 0.3
( 2 , 5 ) 2.6 4.1 5.5 9 0.2 0.2 0.4 0.2
( 2 , 6 ) 5.8 7 8.5 9.6 0.3 0.3 0.2 0.2
( 3 , 2 ) 1.5 2.3 3.6 4.5 0.2 0.2 0.3 0.3
( 3 , 7 ) 6.5 7.2 8.3 9.4 0.5 0.2 0.2 0.1
( 3 , 8 ) 5.9 7.8 8.6 9.9 0.4 0.3 0.1 0.2
( 4 , 3 ) 2.1 3.2 4.5 6.8 0.2 0.2 0.3 0.3
( 4 , 9 ) 1.1 2.2 3.5 4.3 0.2 0.3 0.4 0.1
( 5 , 7 ) 3.2 4.8 6.7 8.2 0.2 0.2 0.3 0.3
( 5 , 10 ) 6.3 7.8 8.4 9.1 0.2 0.2 0.4 0.2
( 6 , 3 ) 6.8 7.7 8.5 9.6 0.4 0.1 0.1 0.4
( 6 , 5 ) 0.6 1.5 3.9 5.8 0.2 0.2 0.3 0.3
( 6 , 7 ) 2.1 4.8 6.6 7.5 0.2 0.4 0.2 0.2
( 7 , 6 ) 4.1 6.3 8.5 9.3 0.2 0.3 0.3 0.2
( 7 , 8 ) 1.6 2.8 5.2 6 0.2 0.3 0.3 0.2
( 7 , 10 ) 1.6 3.4 8.2 9.3 0.2 0.3 0.3 0.2
( 8 , 4 ) 7 8 8.8 9.4 0.2 0.2 0.2 0.4
( 8 , 7 ) 2.1 4.6 8.5 9.6 0.4 0.2 0.2 0.2
( 8 , 9 ) 1.7 4.9 6.5 7.8 0.2 0.2 0.2 0.4
( 7 , 9 ) 3.5 4 5 7.7 0.1 0.2 0.4 0.3
( 9 , 10 ) 4.6 6.4 7.6 8.9 0.4 0.1 0.2 0.3
Table 2. Weight Distribution of Graph 3 with v s= 1 and v d= 10
6 Conclusion
In this paper we tried to learn estimation for each edge’s cost in stochastic
weighted graph from local knowledges. The results show that our experiment is
satisfyig and the walker learns successfully. I’d like to thank H. Beigy for his
comments on the problem.
References
1. Bonet, B., Ge?ner, H.: Planning with incomplete information as heuristic search in
belief space. In: Proceedings of AIPS-2000, (Breckenridge, CO, USA)
2. Frank, H.: Shortest paths in probabilistic graphs. Operations Research 17 (1969)
583–599
3. Loui, R.P.: Optimal paths in graphs with stochastic or multidimensional weights.
Communications of the ACM 26 (1983) 670–676
4. Murthy, I., Sarkar, S.: A relaxation based prunning technique for a class of stochastic
shortest path problems. Transportation Science 30(3) (1996) 220–236
5. Murthy, I., Sarkar, S.: Stochastic shortest path problems with piecewise linear
concave utility functions. Management Science 44(11) (1998) 125–136

8
Fig. 2. Convergence Rate of Our Algorithm to Dijkstra’s Algorithm
6. Rasteiro, D.M., Anjo, A.B.: Metaheuristics for stochastic shortest path problem.
In: Proceedings of 4th MetaHeuristics International Conference (MIC2001), (Porto,
Portugal)
7. Bonet, B., Ge?ner, H.: Solving stochastic shortest-path problems with rtdp. Tech-
nical report, Universidad Simon Bolivar (2002)
8. Barto, A., Bradtke, S., Singh, S.: Learning to act using real-time dynamic program-
ming. Arti?cial Intelligence 72 (1995) 81–138

9
Edges Lengths(Costs) Probabilities
( 1 , 2 ) 16 25 36 0.6 0.3 0.1
( 1 , 4 ) 11 13 26 0.4 0.4 0.2
( 2 , 11 ) 24 28 31 0.5 0.3 0.2
( 2 , 6 ) 13 37 39 0.6 0.2 0.2
( 3 , 2 ) 11 20 24 0.6 0.3 0.1
( 3 , 7 ) 23 30 34 0.4 0.3 0.3
( 3 , 8 ) 14 23 34 0.5 0.4 0.1
( 6 , 7 ) 11 31 37 0.5 0.4 0.1
( 6 , 5 ) 18 25 29 0.5 0.3 0.2
( 5 , 10 ) 27 33 40 0.4 0.3 0.3
( 4 , 12 ) 16 25 37 0.5 0.4 0.1
( 13 , 15 ) 12 31 0.9 0.1
( 2 , 5 ) 11 30 0.7 0.3
( 10 , 14 ) 23 34 0.9 0.1
( 4 , 3 ) 22 30 0.7 0.3
( 4 , 9 ) 35 40 0.6 0.4
( 12 , 9 ) 16 22 0.7 0.3
( 5 , 15 ) 25 32 0.7 0.3
( 6 , 13 ) 21 23 0.5 0.5
( 6 , 3 ) 18 24 0.7 0.3
( 7 , 10 ) 19 23 37 0.6 0.2 0.2
( 7 , 6 ) 12 23 31 0.5 0.3 0.2
( 8 , 7 ) 14 34 39 0.6 0.2 0.2
( 8 , 9 ) 13 31 32 0.8 0.1 0.1
( 8 , 4 ) 13 23 34 0.4 0.3 0.3
( 9 , 7 ) 10 17 20 0.6 0.3 0.1
( 9 , 14 ) 19 24 29 0.4 0.3 0.3
( 10 , 15 ) 15 19 25 0.4 0.3 0.3
( 11 , 13 ) 13 31 25 0.6 0.3 0.1
( 11 , 6 ) 10 19 39 0.5 0.4 0.1
( 14 , 15 ) 14 19 32 0.5 0.3 0.2
( 12 , 8 ) 15 36 39 0.5 0.3 0.2
( 7 , 8 ) 12 15 22 24 0.3 0.3 0.2 0.2
( 10 , 13 ) 14 20 25 32 0.3 0.3 0.2 0.2
( 5 , 7 ) 15 17 19 26 0.3 0.3 0.3 0.1
( 8 , 14 ) 14 15 27 32 0.3 0.3 0.2 0.2
( 9 , 15 ) 12 13 25 32 0.4 0.3 0.2 0.1
( 9 , 10 ) 16 18 36 39 0.3 0.3 0.2 0.2
( 11 , 5 ) 18 19 29 23 0.3 0.3 0.3 0.1
( 5 , 13 ) 28 35 37 40 0.4 0.3 0.2 0.1
( 1 , 3 ) 21 24 25 39 0.5 0.2 0.2 0.1
( 12 , 14 ) 10 13 18 34 0.3 0.3 0.3 0.1
Table 3. Weight Distribution of Graph 4 with v s= 1 and v d= 15

10
Graph Learned Learned Shortest Path Dijkstra’s Edge Cost Path
Shortest Path Shortest Cost By Dijkstra Shortest Residual Length
Length Residual
Graph 1 1 3 8 10 35.12 1 3 8 10 34 0.15 0.51
Graph 2 1 3 7 10 15.24 1 3 7 10 14.57 0.16 0.21
Graph 3 1 4 9 10 16.23 1 4 9 10 16.1 0.16 0.11
Graph 4 1 2 5 15 64.98 1 2 5 15 64.5 0.75 0.37
Table 4. Results of our algorithm with Dijkstra’s algorithm

Tidak ada komentar: