
Hi,
I need to work with huge graphs (10 million nodes, 200 million edges) with edges Edge<int, int>. Is there a way to allocate and initialize the edges with one memory allocation in an array, preferably as value type, as opposite to
creating 200 million tiny objects? How much memory should I expect such a graph to take per vertex/edge?
Thanks,
Vlad



Yes, use SEdge<int,int> instead which is a value type. Let's see: 8byte * 200 * 10^6 = 1600Mb and that's a lower bound. If you have a 64bit machine it should not be an issue, otherwize QuickGraph might not be the best tool to deal with large
graphs. The PBGL (http://www.osl.iu.edu/research/pbgl/) is probably a more scalable solution.


May 28, 2009 at 3:19 AM
Edited May 28, 2009 at 3:27 AM

Thanks for your suggestion. I'm trying to replace edges with SEdges, as now without it I'm running out of memory building a graph of 10M edges. (I have a 64bit, 4G Ram machine). But unless QuickGraph has some significant overhead over Boost, i.e. much
more than 8 bytes per intint edge, plus 4 bytes per cost, I would be fine.
Just one more question. How would you initialize a static undirected clique on integers, say range [1..10000], with score equal to 1 for every edge (I would like to run a spanning tree algorithm on it)?
Would you use AddVerticesAndEdge range? And how would store costs efficiently? If I understand correctly, just using Dictionary will create a linked list of all values for given hash key, i.e. every int will be stored as an object.
Thanks,
Vlad


May 28, 2009 at 3:25 AM
Edited May 28, 2009 at 3:26 AM

A clarification: I can either write a function
int Cost(SEdge e)
or write my own bijective hash function Edges>[0..num_edges1] and store the Costs in the array or List.
Thanks,
Vlad



do you plan to mutate the costs? If not, you could simple use a delegate that returns 1 and be done. If some of the costs would have to be mutated, you could have a small dictionary containing the mutated costs  in that sense you would pay a lesser cost
on memory.
You could also implement your own struct that implements IEdge and has a cost field. You probably want to add the vertices (AddVertexRange), then the edges (AddEdgeRange)  it should be slightly more efficient.



Thanks for you quick response. No, I will not mutate neither costs nor the graph  everything is static. Costs of 1 in a clique were just an example, I now build an array of costs, but will use STaggedEdge<TVertex,
TTag> edge. I assume using tag for edge costs is OK.
I just don't quite understand what AddEdgeRange does internally. If I pass it List<SEdge>, since the function expects IEnumerable, it would not reuse the array and instead would allocate memory for Edges individually (and
copy them to a List<Edge> anyway ?), 200 million times in my case, am I right?
Thanks,
Vlad



In your case where the graph is immutable, we probably need a new/better data structure to represent it.



I've added ArrayAdjacencyGraph which uses a more consice representation. There's a bug in the contracts rewritter so I have not tested it thorousgly yet.



Thanks, that was fast!
I'm reading through the code now, it looks it does not support undirected graph yet. Sorry, if I was not particularly clear, I'm running a spanning tree algorithms on an undirected graph.
Thanks,
Vlad



Hi,
I have an idea how to overcome the scalability problem and I'm interested if it's feasible or, maybe, even already present in QuickGraph.
Is it possible to provide instead of edges enumerators or delegates that given a vertex produce the set of its neighbors? For example, in my particular case vertices are short strings and edges connect similar strings, with weigh equal to the degree of similarity.
Instead of storing a huge set of outgoing or incoming edges anywhere to be used just once, I could compute them on asneeded basis. This recomputing will work for near O(n) algorithms, like spanning tree or strongly connected components and not so good for,
say, flow. But the good thing is that the graph interface is decoupled from the data structure, and if the recomputation is expensive, the user can store the data in array or what ever is efficient for her and just compute the delegates with ability to reuse
all the algorithms implemented in QuickGraph.
One problem that I see here is that precomputing and storing O(V) delegates is not particularly space efficient, higher order functions would be required, AFAIK, here  QuickGraph can receive a function that given a vertex index computes a function that enumerate
the neighbors/edges. I'm using QuickGraph with F#, and this is easily done there, not familiar with C# yet.
Is this something feasible, already available or I'm missing something major?
Thanks,
Vlad



Take a look at the FunctionalImplicitGraph to see if this is something that could fit your needs. It relies solely on a delegate of the form:
bool TryFunc<TVertex, IEnumerable<TEdge>>(TVertex v, out IEnumerable<TEdge> edges);
so you can lazily get the out edges and enumerate them.



I've renamed it to DelegateImplicitUndirectedGraph. Let's what you can do with this one.


May 31, 2009 at 2:45 AM
Edited May 31, 2009 at 8:00 PM

Thanks,
It looks this will work for me. Do you have any idea how much memory will it take internally if I run a spanning tree algorithm? I guess
Kruskal will not be efficient as it has to put in advance all edges into the heap or sort them by cost, so
laziness in graph representation will not help much. But
Prim's heap contains only the edges between the "supernode" and its complement, so theoretically, we are talking about $O(V)$ worst case memory used internally. Is it how it works?
One newbie question  I could not find build instructions anywhere. How do I create QuickGraph.dll from the source? I opened sources/QuickGraph.sln in Visual Studio 2010 Beta 1, it
offered to convert something; then, when I tried to build the project I got a bunch of errors.



Yeah with respect to the build, just unload whatever project that does not build. Msal/Glee require different components that are not checked in the depot. QuickGraph is the root project and should build.



QuickGraph contains Kruskal's algorithm only. I did not migrate Prim's implementation from QuickGraph 1.0.



I just noticed that even superlinear algorithms may have no time overhead due to lazy edge computation in implicit graphs. For example, while Prim is superlinear, each edge is iterated only once, so disregarding the differences in cache performance, the
time will be exactly the same as if all E edges are precomputed.



I correct a previous comment: Prim is supported see http://quickgraph.codeplex.com/Wiki/View.aspx?title=Minimum%20Spanning%20Tree&referringTitle=Home.
Curious to see, how far you can go with the lazy graphs.



Something wrong goes on with the Delegate graphs. Below I attach an example of a very simple graph where I try to find a spanning tree. Sorry for F# code, in C# I cannot write, only read.
getMEM_filter_simple uses chains_scores_1_read_simple to generate a graph of 99 nodes, with edge i< >i+1 of weight 0 for all i in [0..97] and 4 edges of nonzero weight. This graph has a maximum spanning tree of weight 40.
When I use Kruskal (the weights are set as "minus tag") I get the correct result of 40. If MinimumSpanningTreeKruskal is replaced with MinimumSpanningTreePrim I get a result of 32 with edge 6893 missing from the optimal tree. I tried to replace
FibonacciHeap
with the BinaryQueue and got another result for MST: 24! (Side note, I head to modify in "class BinaryQueue<TVertex, TDistance> : IQueue<TVertex>" IQueue to IPriorityQueue. For some reason it was different than
in FibonacciQueue).
I did not have such a discrepancy between Kruskal and Prim prior to switching to the Delegate graph. Notice that Kruskal has to precompute all the edges in advance so for Kruskal the Delegates does not matter.

(*bug.fs*)
#light
open System
open QuickGraph
open QuickGraph.Algorithms
open QuickGraph.Collections
open System.Collections.Generic
open System.IO
let chains_scores_1_read_simple input_read =
let edge =
match input_read with
19 > seq[new SUndirectedTaggedEdge<int, int> (19, 47, 11);new SUndirectedTaggedEdge<int, int> (input_read, input_read+1, 0)]
61 > seq[new SUndirectedTaggedEdge<int, int> (61, 87, 13);new SUndirectedTaggedEdge<int, int> (input_read, input_read+1, 0)]
39 > seq[new SUndirectedTaggedEdge<int, int> (39, 96, 8);new SUndirectedTaggedEdge<int, int> (input_read, input_read+1, 0)]
68 > seq[new SUndirectedTaggedEdge<int, int> (68, 93, 8);new SUndirectedTaggedEdge<int, int> (input_read, input_read+1, 0)]
98 > Seq.empty
input_read > seq[ new SUndirectedTaggedEdge<int, int> (input_read, input_read+1, 0)]
edge
// let link_squares (squares: List<Rect>) (output_chain:bool)
let getMEM_filter_simple () =
new TryFunc<int, IEnumerable<SUndirectedTaggedEdge<int, int>> >
(
fun i ([<Out>] res: byref <IEnumerable<SUndirectedTaggedEdge<int, int>>> ) >
res < (chains_scores_1_read_simple i )
Printf.printf "edges=%A\n" (Seq.to_array res)
not (Seq.isEmpty res))
let build_graph ()= //: seq<int * Array<int>> ) =
let encoding=new System.Text.ASCIIEncoding()
let num_reads = 99
let edge_enum = getMEM_filter_simple ()
let graph = new DelegateUndirectedGraph<int,SUndirectedTaggedEdge<int, int>>
((seq[0..num_reads  1]), edge_enum, false)
let mst_edges = graph.MinimumSpanningTreeKruskal (fun key > float (key.Tag))
let mutable weight = 0
for edge in mst_edges do
weight < weight + edge.Tag
Printf.printf "Edge %A\n" edge
Printf.printf "Total weight=%d\n" weight
let _ = build_graph()

