Mini-workshop on discrete models and methods

University of Szeged, Faculty of Education


Keynote lecture

Directed hypergraphs and Horn formulas
György Turán

Directed graphs and notions such as reachability have several generalizations to hypergraphs. We consider the version where the head a hyperedge is a single vertex, the tail consists of several vertices and reachability is defined by forward chaining. This corresponds, among other things, to the notion of implication for definite Horn formulas in propositional logic. We discuss questions such as the approximability of minimization and random hypergraphs.
Joint work with Amitava Bhattacharya, Bhaskar DasGupta, Dhruv Mubayi, Despina Stasi and Robert Sloan.

Contributed talks

Optical fiber manufacturing: an application of on-line bin-packing
Csaba Fábian, Richárd Némedi, Zoltán Szőke

In this application, statistical information is available on the weights arriving in a random sequence. This is utilized to improve the quality of the solution.
Optical fibers are made of solid rods of glass called preforms. The ends of the preforms are heated and fibers are drawn from them. The fibers may randomly break during the process. The random lengths of fibers produced are then cut on-line into standard lengths. The cutting process should be directed in a way that minimizes the expected loss due to unsatisfied demand. We show that statistical information about fiber breakage can be utilized in designing the cutting process.

ACO for the integrated vehicle-crew scheduling and rostering problem
David Pas, Andrej Brodnik

We will present an ACO based metaheuristic for the integrated vehicle-crew scheduling and rostering problem in public transport. The integrated vehicle-crew scheduling problem is usually solved using Lagrangian relaxation with column generation. However, because of the complexity of the overall integrated problem, to the best of our knowledge no solution yet exists for the complete integration of all three problems. We will provide preliminary results that show the efficiency of our method.

Online reassignment models for scheduling two uniform machines
György Dósa

In scheduling, online reassignment means, that during the online scheduling process, some jobs can be reassigned from one machine to another (in a bounded way), or a buffer can be used which can store temporarily some jobs, or at the end of the scheduling process some jobs can be reassigned. Clearly, this is some kind of semi-online scheduling. The option of reassignment makes the scheduling problem better handable; if we measure the goodness of an algorithm with competitive ratio (which is some kind of worst case preformance ratio), this means that the semi online condition decreases the competitive ratio what can be reached by an (optimal) algorithm. We treat and compare to each other some different but similar reassignment conditions in case of a (fundamental) scheduling problem.

Online geometrical clustering problems
Csanád Imreh

In the online clustering problems, the classification of points into sets (called clusters) is done in an online fashion. Points arrive one by one at arbitrary locations, to be assigned to clusters at the time of arrival. A point can be assigned to an existing cluster, or a new cluster can be opened for it. The goal is to minimize the sum of costs of the clusters used by the algorithm. The cost of a cluster consist two parts: a setup cost and the cost of the points assigned to the cluster (this can be the diameter or the area of the cluster). We present some algorithms in 1 and 2 dimensions which are analysed by competitive analysis.

The dynamics of (weak) social balance on networks
Gabriel Istrate

We study (using rigorous mathematical techniques) two versions of a dynamical system from the Statistical Physics of Social Balance (Antal, Krapivsky and Redner) . They embody a local adjustment process driven by concepts of (weak) structural balance from the social sciences literature (Heider, Davis)
We view the systems as discrete Markov chains, and settle issues such as reachability and recurrence. We then investigate the dependence of the convergence time of these systems on the network topology.
Our first result points out that the correct object that is determining for convergence speed is not network topology but an associated topological object we call "triadic dual". This allows us to show that for the both system mixing is fast on one-dimensional graphs (so-called "triadic cycles").
For the first system we show that in the particular case the triadic dual is a graph, a certain "Cheeger constant" associated to this graph determines the speed of convergence. We compare this rigorous result (in the case of the triangular lattice) with existing experimental findings from the statistical physics literature (Raddichi et al. PRE 2007).
For the dynamics driven by weak social balance we first analyze (using tools from Statistical Mechanics) the phase transition between a harmonious state (where positive relations prevail) and an "atomized" state (where no positive relations are present). Next, we present evidence that the convergence of this dynamics is sometimes faster than that for "ordinary" social balance.

Branching rules in clique search for parallel algorithms
Sándor Szabó

An optimization problem typically performs an optimum test and depending on the outcome either terminates or forks into "smaller" instances. In order to gain insight into the performance of the algorithm it is customary to focus on three critical factors. Firstly, a thorough inspection of the initial problem. Secondly, the quality of estimates on which the stopping rule is based. Thirdly, the branching rule that creates the new subproblems to work with. In this short talk we will have a look at the roles of the branching rules motivated by algorithm working in high performance environment.

Using Clustering Methods to Reduce the Number of Association Rules
Branko Kavsek

Association rule mining is one of the most popular data mining techniques. However, the result of association rule mining algorithms is often a very large set of rules, leaving the analyst with the task to go through all the rules and discover interesting ones. Sifting manually through large sets of rules is time consuming and strenuous. Besides support and confidence that are the de-facto "interestingness measures" used for discovering relevant association rules, many other interestingness measures were proposed by different authors. The aim of the proposed measures is to reduce the number of association rules found by some algorithm and thus keeping the size of the model inside manageable limits. We hereby propose a slightly different approach to association rule number reduction that takes into account the structure of association rules and then uses clustering methods to group together similar association rules (according to their structure). A novel structure similarity measure for association rules is proposed for the purpose. The reduction in the number of association rules can be then achieved by simply showing just the association rules that represent each cluster (i.e. the so called "cluster representatives") and hide all the other rules. This approach proved non-significantly different from other "interestingness measures based" approaches regarding the number of generated rules and the quality of these rules. Moreover, by grouping similar association rules together and analysing each group at will we can gain additional insight in the data.

Smart PAC learners and the co-training model
Balázs Szörényi

In a machine learning task it is a clear advantage to know the underlying distribution from which the random examples are drawn. Still, as Simon has proved recently, (in the PAC framework) one can also construct "universal" algorithms ("smart learners"), that can be used with any distribution, and have a performance comparable with the best learners specialized for that specific distribution. However, the proof is non-constructive, and it is not clear how to construct such a generally applicable algorithm. This suggests to consider special cases of the learning problem. In this talk I analyze one such special case, the co-training model (a widely investigated model in semi-supervised learning), and present smart learners for this framework.

Finding the most probable Role Structure in Social Networks
Stefan Wiesberg

In network analysis, an acknowledged way to obtain structural information about the network is the partitioning of the vertices into so-called role classes. They were mathematically formalized by sociologists in the 1970's and have applications in various fields today, including economical, chemical, biological and consumer research networks.
A question of practical importance is the following one: Given a network G and a finite set of ideal role structures R, which structure in R is closest to the actual structure of G? The question can hardly be answered for the most popular role class definition: regular equivalence. We present one of the first exact algorithms for this problem. It is based on an IP formulation and solved via branch-and-cut. Significant running-time improvements compared to currently used methods are reported.

Results for some variants of on-line bin packing
János Balogh, József Békési, and Gábor Galambos

We review our new results in the area of on-line and semi-on-line bin packing. We describe our bounds for on-line and semi-on-line bin packing problems based on our papers.
Then we improve the lower bound for the asymptotic competitive ratio of any on-line (one-dimensional) bin packing algorithm which knows the optimum value in advance. In the bin packing literature, a method for packing patterns is often used to provide new lower bounds for different on-line bin packing problems. Here we extend the method of packing patterns, giving a new construction for the LP based on packing patterns. The new construction is based on "branching" (input) list sequences. We hope that this type of construction can be applied to other problem as well.
Finally, we give some new bounds for the online variable sized bin packing problem focusing on the case if it is allowed to choose the bin sizes. In this case an asymptotic worst-case ratio of at most 1.104 can be guaranteed with at least seven (well chosen) bin sizes.

Parallel Ant Colony Optimization on the GPU
Marko Grgurovic, Andrej Brodnik

We study the parallel version of the ant colony heuristic run on the graphics processing unit (GPU) solving traveling salesman problem. We focus exclusively on an Nvidia CUDA implementation. Following series of recent results for the GPU model, we introduce a novel approach to updating the pheromone matrix without atomic operations that improves currently the best known non-atomic update method. The later uses n^2 threads each performing n^2 memory accesses. In contrast, our method still uses n^2 threads but each of them performs only n memory accesses. Further, we also present an empirical comparison between the three approaches to the pheromone matrix update: atomic, non-atomic, and non-atomic improved. For the empirical study we use instances from TSPLIB.

The inverse problem of the D-R infection model
András Bóta, Miklós Krész, András Pluhár

Studying virus marketing Domingos and Richardson [1] introduced the so-called Independent Cascade model. Kleinberg, Kempe and Tardos have proven in [2] that this model is an equivalent form of Granovetter’s Linear Threshold model [3]. In that model a small set, the initial adopters (infected vertices) infect neighboring vertices with prescribed probabilities in stages. This model has been extended and applied to many other real-world examples with considerable success [4].
However, these models require that the probabilities of infection assigned to the edges are known a priori. Of course, this information is not provided in most applications, i.e. the probabilities must be estimated somehow. The estimations are usually done with some intuition-guided trial and error and using other vertex or edge functions of the network. Even though this crude approach greatly improves the earlier models for estimating credit default or churn, it is definitely sub-optimal, so more systematic methods are needed.
To handle this inverse infection problem, we divide the data to learning and test sets. Then we try to assign edge probabilities such that the model results in infection patterns similar to the learning set, while we evaluate the overall process by the test set. Usually not the edge probabilities themselves are estimated, but their dependences on other available information, such as the previous behaviors of nodes, like in [5]. In our case these are vertex or edge attributes. Mathematically we face with various optimization problems, where the objective functions are known only implicitly.
It is clear that this task has a very high number of local optima, and finding the global optimum is almost impossible. On the other hand it is reasonable to try to discover as much information as possible about the search space of the problem in order to develop better algorithms to solve it. In this presentation, we study different measures of goodness or error, and try to discover, or at least estimate the parameter surface of the problem, depending on the used functions.
[1] P. Domingos and M. Richardson, Mining the network value of customers. In: Proc. 7th Intl. Conf. on Knowledge Discovery and Data Mining. (2001) 57-66.
[2] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social network. In Proc. KDD, 2003.
[3] M. Granovetter, Threshold models of collective behavior. American J. of Sociology 83 (1978) 1420–1443.
[4] A. Csernenszky, Gy. Kovács, M. Krész, A. Pluhár and T. Tóth, The use of infection models in accounting and crediting. Proc. of the Challenges for Analysis of the Economy, the Business, and Social Progress International Scientific Conference, Szeged, November 19-21, 2009.
[5] Amit Goyal, Francesco Bonchi and Laks V. S. Lakshmanan: Learning influence probabilities in social networks, Proceedings of the third ACM international conference on Web search and data mining. New York, NY, USA 2010.

Detecting equilibria in non-cooperative games using an evolutionary approach
Dan Dumitrescu, Rodica Ioana Lung, Noémi Gaskó

Generative relations and an evolutionary method based on generative relations are presented in order to detect several equilibria. New, combined, so called joint equilibria are also presented. The new joint equilibria allow players to play in different ways, so it can be a better model for real-world situations. A comparison between payoffs is investigated and interpreted.

Equitable equilibrium in games
Réka Nagy

A problem most frequently encountered in classical multi-criteria optimization is that the set of solutions considered by the optimization process is an infinite set, making the selection of a unique preferred decision quite difficult. Considering models with equitable efficiency relieves some of the burden from the decision maker by shrinking the solution set. It turns out that the set of equitably efficient solutions is contained within the set of efficient solutions for the same problem. The above mentioned problem is even more actual in game theory. The most popular solution concepts are Nash and Pareto equilibrium. There equilibrium concepts have some limitations when applied in real life. In most of the cases Nash equilibrium is the most rational choice but it rarely assures maximal payoff. The Pareto equilibrium is often a very large set of solutions that is very hard to process. Our aim is to introduce a type of equilibrium that provides a smaller set of solutions, assures the greatest possible payoffs and is equitable for all players of the game.

Many-objective Optimization using Resonance Search
Mihai Suciu, Dan Dumitrescu

The application of standard dominance-based EMOAs to many-objective problems is drastically limited by the tendency of feasible solutions to become non-dominated when the number of objectives increases. The selection pressure towards the Pareto set is thus severely deteriorated. The Pareto Front may be approximated by aggregating the search results in several fitness landscapes that are considered as instances of a large dynamic landscape (representing a good approximation of the Pareto Set). A multi-population evolutionary technique is investigated in order to solve many-objectives optimization problems. Numerical experiments indicate the effectiveness of the proposed approach.

The Target Visitation Problem
Achim Hildenbrandt

The Target Visitation Problem is a combination of the Traveling Salesmen Problem and the Linear Ordering Problem. The objective is to find a tour which is optimal in sense that sum of the met preference minus the distance cost is maximal. Since this problem has been barely investigated so far we will give an IP -formulation first. Then we will present some results concerning different classes of facets, zero lifting conditions and the dimension of the polytope. We will use these results for designing a branch and cut algorithm for the exact solving of the TVP.

Cheating prevention in multiplayer online games
Timo Knuutila and Jouni Smed

Game industry is one of the strongest growing branches of software production. Over 60% of computer game players compete with others online, which means that the significance of a fair, cheater-free playground has a significant impact on the games industry serving them. With the advent of Internet gaming and massively multiplayer on-line games, cheating has become a major problem for the game organizers: a game riddled with cheaters does not attract players but loses quickly its business.
Early cheating methods focused on compromising the game software or tampering with the network traffic. Since these kinds of low-level attacks were easy to counteract, cheating has taken much subtler forms. Because forged data can be detected automatically, cheating takes now place inside the game world: The cheater acts according to the rules of the game but adjusts his behavior in the game according to some hidden agenda. This hidden agenda is usually defined as collusion, which is covert collaboration forbidden by the rules of the game. Although collusion detection and prevention in multiplayer online games has been recognized as an important research topic, it has not yet been approached systematically apart from our research group’s efforts.
We present an overview of our team's efforts on collusion detection so far and sketch out our current research aims.

Solving Bounded Model Checking Problems by a Multi-Domain Logic Based SAT Solver
Gábor Kusper, Gergely Kovásznai, Csaba Biró

We discuss the advantages of using Multi-Domain Logic (MDL - a version of signed logic) for solving Bounded Model Checking (BMC) problems. On one hand, MDL can encode BMC problems in a more compact and natural way. On the other hand, the resulting MDL problems can be solved faster then their boolean versions using our SAT solver.
Many software and hardware verification problems are reduced to Boolean satisfiability (SAT) of relatively large formulae. Moreover, various formal verification techniques, notable model checking and in particular bounded model checking generate satisfiability problems which are not genuinely boolean, but they have more natural encoding in signed propositional logic (in which a variable may have more than two values).
Currently these problems are reduced to boolean SAT because the availability of SAT tools, however this involves a certain loss of information (wrt. the original problem) and also a certain loss of efficiency.
Multi-Domain Logic (MDL) is a generalization of signed logic, in which every variable has its own domain. This aspect increases the efficiency of direct solving of MDL satisfiability, because the solving process proceeds by reducing the size of the domains (contradiction appears as an empty domain). In contrast to the usual approach of translating signed logic satisfiability into Boolean satisfiability, we implement the generalized DPLL directly for MDL, using a specific version of the techniques used for signed logic.
Moreover, we use a novel technique, called variable merging, which consists in replacing two or more variables by a new one, whose domain is the cartesian product of the old domains. This operation is used during the solving process in order to reduce the number of variables. Moreover, variable merging can be used at the beginning of the solving process in order to translate a boolean SAT problem into an MDL problem.
We have also created a preprocessor using Xtext technology, which is able to translate a BMC problem, encoded in a subset of NuSMV format, into an MDL problem.

Finite and Pushdown Automata with Tranclucent Letters*
Benedek Nagy

It is well-known that finite automata accept the regular languages and (nondeterministic) pushdown automata accept the context-free languages. In this talk we consider extensions of these models. In each state there is a subset of the alphabet that contain translucent letters, the machine do not see them in this state. In this way the automaton can access not only the very first/next letter of the input, the process does not go in strictly left to right order, in other state the machine can read letters that were transclucent before.
We show that finite state machines with thranslucent letters are quite expressive, they accept all rational trace-languages and some non context-free languages can be accepted by them. The determinsitic variant is less powerful, athough all regular and some non regular and non context-free languages are accepted by them. The pushdown automata that are extended in the same manner accept all context-free trace-languages. These models accept only semi-linear languages. Other extension of the pushdown automata, when the machine has translucent letters in the stack and therefore not only the topmost pusdown symbol can be accessed in some steps, are universal in the sense that all recursively enumerable languages are accepted by them.

* The talk summarizes some results of a MÖB-DAAD bilateral project including University of Debrecen and University of Kassel (Friedrich Otto and Marcel Vollweiler). The project was also supported by the TÁMOP 4.2.1/B-09/1/KONV-2010-0007 project, which is implemented through the New HungaryDevelopment Plan, co-financed by the European Social Fund and the European Regional Development Fund.

Analysis and Data Mining of the Hungarian Common Sense Database using Kohonen's Self Organizing Networks
Tibor Tajti, Emőd Kovács, and Gábor Kusper

The Hungarian Common Sense project collects those Hungarian sentences, which are commonly agreed to be valid, like “Dogs can yelp”. The sentences are stored in a database using ontology based structure. We analyse this database using a Kohonen network for clustering and to find the central notions in the database. This helps us to select those notions which are possibly underweighted and we can ask our students to give their common sense about them. The analysis and data mining also can help to make findings about relations between notions or areas in the knowledge space.

An integration of vehicle and crew scheduling with crew rostering in Public transport company
Nebojsa Gvozdenovic

Traditionally, planing in bus public transport is divided into 5 phases: the planning of bus lines, time tabling, vehicle scheduling, crew scheduling and crew rostering. We consider the partial integration of the last three.
Our approach is based on using driver groups and the rule that during a day a vehicle can be driven only by drivers belonging to a same group. In other words, in scheduling process, we can assign vehicles to groups. In our model, the main characteristic of a drivers group is its sublist of weekly timetables (Sunday's, Saturday's or working day's) it can appear in. It simply specifies possibilities of realization of a timetable by a group. As a major requirement,we have that the number of vehicles assigned to a group has to be the same for all weekly time tables appearing in the group's sublist. We propose a heuristic for integrated vehicle and crew scheduling that complies with the requirement and report results on real data for the suburban bus lines in the Public transport company Novi Sad.

The bus rescheduling problem: models and algorithm
Viktor Árgilán, Balázs Dávid, Miklós Krész

In the area of optimization of public transportation, the topic of recovery from disruptions and the rescheduling of vehicles is a considerably new research field. Companies usually use a backup vehicle from the depot to address these problems, though better costs might be achieved with a proper solution algorithm. Because of its relatively new status, literature related to the above topic is scarce. This presentation gives an overview of existing methods dealing with rescheduling, and presents an exact model for the multiple depot vehicle rescheduling problem in public bus transportation. Since the size of the presented model does not support fast solution by exact methods, an efficient heuristic approach is also given.