Seminars

INF_Onderzoek_17354_head_large.jpg
Seminars
The last Tuesday of the month at 11.00 hours a 'Open University Research Seminar Informatica' (OUrsi) takes place in heerlen. The OUrsi features presentations on ongoing research in all areas of Computer Science. The presentations are normally streamed online for the benefit of interested parties elsewhere. 

If you are interested in giving a presentation, or attending one (live in Heerlen or online), please contact Hugo Jonker. Below are listed the scheduled presentations, with the upcoming OUrsi

Presentations

No. Date Presenter Title  
59. 7 September 2021  Fabian van den Broek Getting attribute-based signatures in the legal framework  
  Abstract:
Digital signatures were invented in the 70s. In The Netherlands, digital signatures have been recognised by law since the 2000s. A special variant of digital signatures, attribute-based signatures, was first proposed in 2010. Since then, various studies have explored the possibilities of attribute signatures and enriched their application area. For example, attribute-based signatures are incorporated into the IRMA framework to enable non-interactive proofs of attributes. This has brought attribute-based signatures (literally) into the hands of many ordinary citizens. However, to date, attribute-based signatures are not specifically mentioned in the law. In this presentation, we discuss the advantages and limits of using attribute-based signatures by means of several case studies. Based on these case studies, we analyze whether and to what extent attribute-based signatures fit within current legal regulations. We consider both regular attribute-based signatures and anonymous attribute-based signatures. The latter are of particular interest, as the law defines signatures as containing an identifying aspect -- which is absent in these anonymous signatures.
 
58. 7 July 2021 Gideon Maillette de Buy Wenniger Efficient deep learning using richer context, with applications to handwriting recognition and scholarly document quality predictione  
  Abstract:
Using richer context is one of the driving forces behind many advances in the performance of deep learning models, and before the advent of deep learning, it has been for other machine learning approaches also. My interest in richer context for machine learning models goes back to my PhD [1,2,3], in which reordering context labels were applied to improve translations produced by a hierarchical statistical machine translation model. More recently I worked on neural handwriting recognition [4], in which multi-dimensional long short-term memories (MDLSTMs) are applied to encode images while considering the full context of all other pixels in the image. In this application computational efficiency is crucial, and I developed various new strategies to increase it. The second application I will cover is scholarly document quality prediction. In this task, some property indicative of or correlated to quality is predicted from the textual and/or visual contents of a scholarly document alone. This property can be 1) the acceptance/rejection of a scientific paper in a journal or major conference, or 2) the log of the number of citations. The latter property, while imperfect as a notion of quality, is attractive because of its availability for huge amounts of scientific publications. I will discuss two new approaches presented in my recent work, that leverage ways to increase context for quality prediction models. The first approach [5] does so by incorporating document structure information to prediction, and the second approach [6] by full text encoding facilitated by the use of a BERT+GRU model in combination with chunking of the full document text. Using these approaches, new state-of-the art results were achieved for accept/reject prediction and log number of citation prediction tasks. I will end my talk with an outlook on my current ongoing work to make scholarly document quality prediction more explainable using saliency maps and other explainable deep learning methods. The work on hierarchical statistical machine translation, only mentioned briefly, is joint work together with my former PhD supervisor Khalil Sima'an. The work on handwriting recognition is joint work with Lambert Schomaker and Andy Way. Finally, the work on scholarly document quality prediction is joint work with Thomas van Dongen, Lambert Schomaker, Elerei Aedmaa, Herbert Kruitbosch, and Edwin Valentijn. See the below list of references for more information about the research.

 

57. 22 June 2021 Hans van Ditmarsch Gossip and Knowledge  
  Abstract:
A well-studied phenomenon in network theory since the 1970s are optimal schedules to distribute information by one-to-one communication between nodes that are connected in the network. One can take these communicative actions to be telephone calls, and protocols to spread information this way are known as gossip protocols or epidemic protocols. A common abstraction is to call the information of each agent its secret, and that the goal of information dissemination is that all agents know all secrets: that is the termination condition of gossip protocols. Following investigations assuming a global scheduler, it is now typically assumed that gossip protocols are distributed in some way, where the only role of the environment is to ensure randomization. Statistical approaches to gossip have taken a large flight since then, wherein network topology is an important parameter. In epistemic gossip protocols, an agent (node) will call another agent not because it is so instructed by a scheduler, or at random, but based on its knowledge or ignorance of the distribution of secrets over the network and of other agents' knowledge or ignorance of that. One such protocol requires that an agent may only call another agent if it does not know the other agent's secret. Epistemic features of gossip protocols may affect their termination, the (order of complexity) expectation of termination, their reachability (what distributions of secrets may occur before all agents know all secrets), and so on. Variations involve agents exchanging telephone numbers in addition to agents exchanging secrets (which results in network expansion), or agents exchanging knowledge about secrets; we may also assume common knowledge of the protocol; further generalizations would involve multi-casts. We present a survey of distributed epistemic gossip protocols.

Slides

56. 15 June 2021 Joshua Moerman Weighted Register Automata  
  Abstract:
Register automata are an extension of automata that deal with data, possibly from an infinite domain. Many algorithms from classical automata theory remain to work on deterministic register automata. However, the nondeterministic register automata are strictly more expression and some properties become undecidable. Motivated by a recent result that the subclass of unambiguous register automata has decidable equivalence, we investigate weighted register automata, the common generalisation of register automata and weighted automata. These include unambiguous register automata. We show that equivalence is also decidable for these automata. In our view, the main contribution is the development of the mathematical theory of so-called orbit-finitely spanned vector spaces, on which our decision procedure is based. This is joint work with Mikołaj Bojańczyk and Bartek Klin.

Slides

55. 25 May 2021 Nico Naus Exploit Logic  
  Abstract:
Automatic exploit generation is a relatively new area of research. Work in this area aims to automate the manual and labour intensive task of finding exploits in software. With Exploit Logic, we are aiming to lay the foundation for a new automatic exploit generation system. Exploit Logic, which formally defines the relation between exploits and the preconditions which allow them to occur. This relation is used to calculate the search space of preconditions for a specific program and exploit. This presentation is an introduction into Exploit Logic, and the current state of my research.

Slides

54. 20 April 2021 Frouke Hermens Data driven group comparisons of eye fixations to dynamic stimuli  
  Abstract:
Recent advances in software and hardware have allowed eye tracking to move away from static images to more ecologically relevant video streams. The analysis of eye tracking data for such dynamic stimuli, however, is not without challenges. The frame-by-frame coding of regions of interest (ROIs) is labour intensive, and computer vision techniques to automatically code such ROIs are not yet mainstream, restricting the use of such stimuli. Combined with the more general problem of defining relevant ROIs for video frames, methods are needed that facilitate data analysis. Here we present a first evaluation of an easy-to-implement data-driven method with the potential to address these issues. To test the new method, we examined the differences in eye movements of self-reported politically left- or right-wing leaning participants to video clips of left- and right-wing politicians. The results show that our method can accurately predict group membership on the basis of eye movement patterns, isolate video clips which best distinguish people on the political left-right spectrum and reveal the section of each video clip with the largest group differences. Our methodology thereby aids the understanding of group differences in gaze behaviour, and the identification of critical stimuli for follow-up studies or for use in saccade diagnosis.

Slides

53. 23 March 2021 Vincent van der Meer Deducing File History from NTFS timestamps  
  Abstract:
Proper attribution is a key factor in digital forensics. In this paper, we investigate one such area of attribution: file timestamp history. This is crucial for determining whether a file could have been accessible to a suspect. In particular, we determine file history given a file's current NTFS timestamps. That is, we infer all possible sequences of file system operations which culminate in the file's current NTFS timestamps. To this end, we provide an exhaustive overview of such operations and their effect on timestamps, clarifying inconsistencies in literature. Based upon this overview, we propose a method for reconstructing possible histories of operations, accounting for various forms of timestamp forgery. We provide an implementation of this method, that depicts possible histories graphically.

Slides

52. 9 March 2021 Stef Joosten Programming with relation algebra's  
  Abstract:
Why use relation algebra as a programming language? Well, besides being cool for using formal specifications directly for programming, it gives neat language properties such as declarative semantics, strong typing, reactive programming, and calculating with programs. And it solves real engineering problems that will help solve the software crisis (Fred Brooks, The Mythical Man-Month: Essays on Software Engineering, Addison-Wesley, 1975). This presentation gives an overview of the language Ampersand, its formal foundations and its applications in industry. It contains a demonstration. We end by discussing some research challenges that are stil open.

Slides

51. 23 February 2021 Olivia Rodríguez Valdés Towards a testing tool that learns to test  
  Abstract:
GUI testing is a type of testing that verifies software as a whole through its GUI, interacting with widgets and testing the System Under Test (SUT) from the user's perspective. Guiding the testing process of the SUT in a smarter way should lead to an enhancement of error detection/recognition. We will study the application of Reinforcement Learning techniques in automated GUI testing. Using the scriptless GUI testing tool TESTAR as a vehicle, we will focus our research on improving the action selection mechanisms, that guide the learning process. We will tailor rewards towards the goal of enhancing the effectiveness and efficiency of testing.

Slides

50. 26 January 2021 Luc Edixhoven Software technology for automated feedback generation  
  Abstract:
Parallelisation is becoming ever more important in modern-day computing and, consequently, also proper communication between different threads, processes and machines. However, existing static methods to verify the safety of message passing-based communication protocols are generally limited in their expressiveness. As a result, many protocol implementations which are, in fact, safe, cannot be formally verified as such. Our aim is to find an expressive formal basis which is able to support, if possible, any safe protocol. Our first step in this adventure deals with regular protocols; we base our approach on formal language theory, by enriching regular expressions with a parameterised interleaving operator.

Slides

49. 8 December 2020 Bastiaan Heeren Software technology for automated feedback generation  
  Abstract:
Intelligent Tutoring Systems (ITSs) that offer well-designed feedback have the potential to enhance education in many problem domains. However, the cost of developing such systems is known to be high, with an estimated 200 hours of development for one hour of practice. The Ideas research project aims at simplifying ITS construction by using software technology for automated feedback generation. Examples of such techniques are embedded domain-specific languages, combinators and their algebraic laws, (datatype-)generic programming, and rewriting. Over the past 14 years, our approach has been tested on many problem domains, and has supported the exploration of new feedback opportunities. In this presentation, I will explain two techniques: the embedding of light-weight rewrite rules in problem-solving procedures, and the use of generic traversals for controlling rule application. The techniques will be illustrated by a simple problem-solving procedure for rewriting propositions into negation normal form.
Slides

 

48. 24 November 2020 Alaaeddin Swidan Learning programming for school-age children: challenges and opportunities  
  Abstract:
There are currently a variety of programming environments and tools meant to teach children programming, as one aspect of a larger goal of developing their computational thinking. Block-based programming is one of the popular choices to introduce programming to children. Due to the low ceiling that such environments provide, there comes a point where students are introduced to other "traditional" programming languages. With this learning path in mind, in addition to the idea that learning programming is difficult, there arise many challenges that children and educators face. In this presentation we will focus on two of them: challenges to achieve higher conceptual development in block-based languages and the efforts needed for a successful transfer of knowledge from block based to textual programming languages.
Slides
47. 3 November 2020 Sung-Shik Jongmans Discourje: Runtime Verification of Communication Protocols in Clojure  
  Abstract:
This talk presents Discourje: a runtime verification framework for communication protocols in Clojure. Discourje guarantees safety of protocol implementations relative to specifications, based on an expressive new version of multiparty session types. The framework has a formal foundation and is itself implemented in Clojure to offer a seamless specification-implementation experience. Benchmarks show Discourje's overhead can be less than 5% for real/existing concurrent programs.
Slides
46. 14 October Stefano Bromuri Deep Learning applied to finance and logistics    abstract slides  
  Abstract:

This presentation is meant to be a highlight concerning the activities of the CAROU team in the Heerlen campus for all computer scientists that may be interested in participating. Specifically, this presentation starts by revisiting the concepts of machine learning (ML) and deep learning (DL). After discussing the building blocks of ML and DL, the presentation then focuses on three projects in the AI domain currently carried out by CAROU with and for industrial partners in the Heerlen campus. The three domains considered are emotion analytics (APG), logistics (DHL) and customer segmentation (APG). Ethical considerations are discussed for each of the three projects.
Slides

45. 29 September 2020 Marloes Venema (Radboud University) Systemizing Attribute-Based Encryption -- In Search of Practical Decentralized ABE  
 

Abstract:
Attribute-based encryption (ABE) cryptographically implements fine-grained access control on data. As such, data can be stored by an entity that is not necessarily trusted to enforce access control, or an entity that is not even trusted to have access to the plaintext data at all. Instead, access control can be externally enforced by a trusted entity. For instance, you can store your personal data in the cloud without the storage servers getting actual access to it. You can also manage access to the data, and share it with the people with whom you want to share that data. Additionally, some multi-authority variants of ABE -- which do not have a central authority -- can effectively and securely implement access control in multiple-domain settings. We show that, despite the amount of work that has been done in the area of ABE, a practical and privacy-friendly scheme does not exist yet. To make ABE more practical and privacy friendly, we outline important open problems.
Slides

44. 15 September 2020 Christof Ferreira Torres
(Université du Luxembourg)
Ægis: Shielding Vulnerable Smart Contracts Against Attacks  
  Abstract:
In recent years, smart contracts have suffered major exploits, costing millions of dollars. Unlike traditional programs, smart contracts are deployed on a blockchain. As such, they cannot be modified once deployed. Though various tools have been proposed to detect vulnerable smart contracts, the majority fails to protect vulnerable contracts that have already been deployed on the blockchain. Only very few solutions have been proposed so far to tackle the problem of post-deployment. However, these solutions suffer from low precision and are not generic enough to prevent any type of attack. In this work, we introduce Ægis, a dynamic analysis tool that protects smart contracts from being exploited during runtime. Its capability of detecting new vulnerabilities can easily be extended through so-called attack patterns. These patterns are written in a domain-specific language that is tailored to the execution model of Ethereum smart contracts.The language enables the description of malicious control and data flows. In addition, we propose a novel mechanism to streamline and speed up the process of managing attack patterns. Patterns are voted upon and stored via a smart contract, thus leveraging the benefits of tamper-resistance and transparency provided by the blockchain. We compare Ægis to current state-of-the-art tools and demonstrate that our solution achieves higher precision in detecting attacks. Finally, we perform a large-scale analysis on the first 4.5 million blocks of the Ethereum blockchain, thereby confirming the occurrences of well reported attacks, as well as uncovering unreported attacks in the wild.
Slides
43. 12 May 2020 Stijn de Gouw Verifying OpenJDK's LinkedList using KeY  
  Abstract:
As a particular case study of the formal verification of state-of-the-art, real software, we discuss the specification and verification of a corrected version of the implementation of a linked list as provided by the Java Collection framework.
Slides
42. 11 May 2020 Clara Maathuis Social Manipulation Campaigns  
  Abstract:
The increased media attention towards the growing number of social manipulation campaigns that aimed at influencing or damaging perceptions, processes, and systems, succeeded in rerouting and rewriting thoughts and agendas of academic researchers, professionals, and decision makers in the last years. This implied that after a long struggle, wearing and seeing through offensive lenses could be perceived as the right approach to advancing effective, efficient, and adaptive intelligent solutions, programs, and strategies for defence, deterrence, and resilience purposes when fighting such threats. However, these measures should embed in their design phase socio-technical constraints, complexities, requirements, and values, and should further consider hybrid implementation and evaluation methods. In these lines, this presentation aims at i) addressing this phenomenon, and ii) discussing some incidents and solutions to countering such threats.
Slides
41. 14 April 2020 Freek Verbeek Formal Proofs of Return Address Integrity  
  Abstract:
We present a methodology for generating a characterization of the memory used by an assembly program, as well as a formal proof that the assembly is bounded to the generated memory regions. A formal proof of memory usage is required for compositional reasoning over assembly programs. Moreover, it can be used to prove low-level security properties, such as integrity of the return address of a function. Our verification method is based on interactive theorem proving, but provides automation by generating pre- and postconditions, invariants, control-flow, and assumptions on memory layout. As a case study, three binaries of the Xen hypervisor are disassembled. These binaries are the result of a complex build-chain compiling production code, and contain various complex and nested loops, large and compound data structures, and functions with over 100 basic blocks. The methodology has been successfully applied to 251 functions, covering 12,252 assembly instructions.
Slides
40. 5 November 2019 Sung-Shik Jongmans Distributed programming using role-parametric session types in Go  
  Abstract:
In this talk, I'll present a framework for the static specification and safe programming of message passing protocols where the number and kinds of participants are dynamically instantiated. More precisely, we developed the first theory of distributed multiparty session types (MPST) to support parameterised protocols with indexed roles; our framework statically infers the different kinds of participants induced by a protocol definition as role variants, and produces decoupled endpoint projections of the protocol onto each variant. This enables safe MPST-based programming of the parameterised endpoints in distributed settings: each endpoint can be implemented separately by different programmers, using different techniques (or languages). We proved the decidability of role variant inference and well-formedness checking, and the correctness of projection. We implemented our theory as a toolchain for programming such role-parametric MPST protocols in Go. Our approach is to generate API families of lightweight, protocol- and variant-specific type wrappers for I/O. The APIs ensure a well-typed Go endpoint program (by native Go type checking) will perform only compliant I/O actions w.r.t. the source protocol. We leveraged the abstractions of MPST to support the specification and implementation of Go applications involving multiple channels, possibly over mixed transports (e.g., Go channels, TCP), and channel passing via a unified programming interface. We evaluated the applicability and run-time performance of our generated APIs using microbenchmarks and real-world applications. This is joint work with David Castro, Raymond Hu, Nicholas Ng, and Nobuko Yoshida; the talk is based on a paper that was published in POPL 2019 (https://dx.doi.org/10.1145/3290342).
Slides
39. 24 September 2019 Hugo Jonker Fingerprint surface-based detection of web bot detectors  
  Abstract:
Web bots are used to automate client interactions with websites, which facilitates large-scale web measurements. However, websites may employ web bot detection. When they do, their response to a bot may differ from responses to regular browsers. The discrimination can result in deviating content, restriction of resources or even the exclusion of a bot from a website. This places strict restrictions upon studies: the more bot detection takes place, the more results must be manually verified to confirm the bot's findings. To investigate the extent to which bot detection occurs, we reverse-analysed commercial bot detection. We found that in part, bot detection relies on the values of browser properties and the presence of certain objects in the browser's DOM model. This part strongly resembles browser fingerprinting. We leveraged this for a generic approach to detect web bot detection: we identify what part of the browser fingerprint of a web bot uniquely identifies it as a web bot by contrasting its fingerprint with those of regular browsers. This leads to the fingerprint surface of a web bot. Any website accessing the fingerprint surface is then accessing a part unique to bots, and thus engaging in bot detection. We provide a characterisation of the fingerprint surface of 14 web bots. We show that the vast majority of these frameworks are uniquely identifiable through well-known fingerprinting techniques. We design a scanner to detect web bot detection based on the reverse analysis, augmented with the found fingerprint surfaces. In a scan of the Alexa Top 1 Million, we find that 12.8% of websites show indications of web bot detection.
Slides
38. 5 July 2019 Christof Ferreira Torres The Art of The Scam: Demystifying Honeypots in Ethereum Smart Contracts  
  Abstract:
Modern blockchains, such as Ethereum, enable the execution of so-called smart contracts -- programs that are executed across a decentralised network of nodes. As smart contracts become more popular and carry more value, they become more of an interesting target for attackers. In the past few years, several smart contracts have been found to be vulnerable and thus exploited by attackers. However, a new trend towards a more proactive approach seems to be on the rise, where attackers do not search for vulnerable contracts anymore. Instead, they try to lure their victims into traps by deploying vulnerable-looking contracts that contain hidden traps. This new type of contracts is commonly referred to as honeypots. In this paper, we present the first systematic analysis of honeypot smart contracts, by investigating their prevalence, behaviour and impact on the Ethereum blockchain. We develop a taxonomy of honeypot techniques and use this to build HONEYBADGER: a tool that employs symbolic execution and well defined heuristics to expose honeypots. We perform a large-scale analysis on more than 2 million smart contracts and show that our tool not only achieves high precision, but also high scalability. We identify 690 honeypot smart contracts as well as 240 victims in the wild, with an accumulated profit of more than $90,000 for the honeypot creators. Our manual validation shows that 87% of the reported contracts are indeed honeypots.
Slides
37. 18 June 2019 Greg Alpár How resistance can turn into curiosity? -- The Open Maths approach  
  Abstract:
Open Maths is an initiative, based on state-of-the-art research results, to make mathematics accessible and enjoyable for students in the higher education and in life-long learning. In this talk, I will give an overview about the first two Open Maths courses and their impact on students. Be prepared for some surprises while we delve deeper in one particular aspect of this new approach!
36. 28 May 2019 Vincent van der Meer Investigating File Fragmentation  
  Abstract:
In digital forensics, data recovery tools are used to find lost or deleted evidence. One popular approach is file carving, which recovers files based on their contents instead of metadata (such as a file system). To recover fragmented files, file carvers must make assumptions about fragmentation of files on the examined devices. There has been one large- scale study on file fragmentation, on devices from 1998-2006. Significant advances in hard- and software since then more than merit a new study. In this paper, we examine file fragmentation of 220 systems (334 drives), that serve as the primary computer of their owners. To this end, we have devised a privacy-friendly approach to acquiring the data. Only meta- information of the file system was collected. Moreover, meta-information was pro-actively filtered to avoid any potential privacy-sensitive issues. Our results indicate that the average level of fragmentation is lower com- pared to previous studies. Interestingly, we find that fragmentation not only concerns gaps between blocks of the same file, but also the order of these blocks. In particular, we find that a type of fragmentation where a later block is stored on disk before an earlier block of the same file, occurs in nearly half of all fragmented files. This type of out-of-order fragmen- tation is rarely discussed in literature, yet its high rate of occurrence should impact the practice of file carving.
35. 29 January 2019 Frank Tempelman The Semi-Direct Control System - a vehicle for AI research  
  Abstract:
In mijn voordracht geef ik een overzicht van mijn mogelijke onderzoeksonderwerpen op met name het gebied van kunstmatige intelligentie en dan vooral, gegeven mijn achtergrond en contacten, in combinatie met luchtvaart. Aan de orde komt het Semi-Direct Control System, een alternatief besturingssysteem voor onbemande vliegtuigen (UAVs), voornamelijk toepasbaar in zogeheten 'dog fights' op grote afstand, waarbij vliegtuigen elkaar op korte onderlinge afstand te lijf gaan. Het probleem is in deze dat een direct controlesysteem, waarbij de vlieger op vaak duizenden kilometers afstand rechtstreeks op basis van sensorinformatie van het UAV met zijn controle-apparatuur het ding bestuurt, te traag is. Dat zou op te lossen te zijn door het UAV een veel grotere mate van autonomie te geven: geef hem meer intelligentie en je hoeft hem niet steeds van commando's te voorzien, want hij kan het zelf. Zover zijn we echter nog niet, en het SDCS vormt een tussenweg, waarbij de frequentie van commando's een trade-off vormt met de benodigde intelligentie in het UAV. Daarmee biedt het SDCS een mooi en praktisch toepasbaar onderzoeksonderwerp voor de (toename van) autonomie in vliegtuigen en ook andere voertuigen, op het gebied van onder andere kennisacquisitie, - formalisering en -representatie, beeldherkenning en beslissingslogica. Het zou heel interessant zijn om die trade-off tussen de mate van controle en de benodigde intelligentie in het vliegtuig in kaart te brengen.
34. 11 December 2018 Bernard van Gastel On the security of self-encrypting SSDs  
  Abstract:
Various SSDs offer an option to enable encryption at a firmware level. The security goals should be similar to those of disk encryption handled by software. And indeed, Bitlocker, the disk encryption built into Windows, fully relies on firmware-level encryption if it is present. However, when Bernard and Carlo Meijer (RU) investigated drives from Samsung and Crucial, they found several issues with the afforded security, such as a master password set to the empty string.
33. 16 October 2018 Johan Jeuring Intelligent Programming Tutors  
  Abstract:
Intelligent programming tutors (IPTs) are used to support students who learn how to program. There exist IPTs for almost any programming language. In this talk I will give a brief introduction to IPTs. In particular, I will discuss what kind of feedback and hints these tutors give, and what technologies they use to calculate feedback.
Slides
32. 25 Sept 2018 Josje Lodder Developing an e-learning tool for inductive proofs  
 

Abstract:
Proving is a central subject of most logic course. Students attending such courses do not only have to learn how to prove consequences in a formal system, but they also have to reason about formal systems. A typical example of an exercise that occurs in many textbooks is the following: prove that the number of left parentheses in a logical formula is equal to the number of right parentheses. These kinds of properties can be proved by structural induction, where the structure of the proof follows the structure of the inductive definition of the logical language. Students have conceptual as well as technical problems with inductive proofs. In this talk I will discuss our ideas and first results concerning the development of an e-learning tool that helps students to practice with inductive proofs.
Slides

31. 26 June 2018 Hugo Jonker Investigating Adblock wars  
 

Abstract:
We investigate the extent to which websites react to filter list updates. We collected data on the Alexa Top-10K for 120 days, and on specific sites newly added to filter lists for 140 days. By evaluating how long a filter rule triggers on a website, we can gauge how long it remains effective. We matched websites with both regular adblocking filter lists (Easylist) and with filter lists that remove anti-adblocking (primarily UblockProtector / NanoProtector). From our data, we observe that the effectiveness of the Easylist adblocking filter decays initially, but after ~80 days seems to stabilize. However, we found no evidence for any significant decay in effectiveness of the more specialized, but less widely used, Ublock Protector / Nano Protector list.

30. 24 April 2018 Stijn de Gouw Human-in-the-Loop Simulation of Cloud Services  
 

Abstract:
We present an integrated tool-suite for the simulation of software services which are offered on the Cloud. The tool -suite uses the Abstract Behavioral Specification (ABS) language for modeling the software services and their Cloud deployment. For the real-time execution of the ABS models we use a Haskell backend which is based on a source-to-source translation of ABS into Haskell. The tool-suite then allows Cloud engineers to interact in real-time with the execution of the model by deploying and managing service instances. The resulting human-in-the-loop simulation of Cloud services can be used both for training purposes and for the (semi-)automated support for the real-time monitoring and management of the actual service instances.
Slides

29. 27 March 2018 Benjamin Krumnow The Shepherd Project: Automated security audits of website login processes  
  Abstract:

HTTP does not have memory. HTTP Cookies were created to address this. These cookies thus allow web sites to remember things, such as whether you are logged in or not. After logging in, the cookies authenticate the user, so that the user does not have to type his password with every click. If an attacker steals these cookies, he could thus impersonate a user. In 2010, the Firesheep plugin for Firefox demonstrated how easy it was to steal such cookies for many popular web sites (Gmail, Facebook, etc.). The problem that Firesheep exploited was that cookies were sent over an insecure channel (HTTP instead of HTTPS). Such an attack is called a session hijacking attack.

In the Shepherd project, we investigate the adoption of countermeasures against session hijacking attacks. Over the last year, we have developed an automatic selenium-based scanner. This scanner performs an automated audit of the security of login processes of web sites.
In this OUrsi talk, I will give you an overview of the current scanning framework, explain how it assesses login security and talk about our current study, where we analyse over 65K websites. I hope that we will have lively discussion to identify possible unclear points and future directions.
Slides

28. 27 February 2018 Vincent van der Meer Future-proofing recovery of deleted files  
  Abstract:
Reconstructing deleted files, file carving, is a crucial part of digital forensic work, often concerning child pornography. The explosive growth of data makes file carving increasingly time-consuming and therefore increasingly inadequate, so that evidence remains hidden. File carving is still set up separately per device and per app, resulting in a considerable delay to develop file carving for new apps. This research will investigate how the app-specific part can be made as small as possible by abstraction and generalization. Spin-off will be a widely applicable, modular, maintainable and expandable framework with corresponding benchmark to compare the results of new technologies.
Slides
27. 19 February 2018 Hassan Alizadeh Intrusion Detection and Traffic Classification using Application-aware Traffic Profiles  
 

Abstract:
Along with the growing number of applications and end-users, online network attacks and advanced generations of malware have continuously proliferated. Many studies have addressed the issue of intrusion detection by inspecting aggregated network traffic with no knowledge of the responsible applications/services. Such systems fail to detect intrusions in applications whenever their abnormal traffic fits into the network normality profiles. This research addresses the problem of detecting intrusions in (known) applications when their traffic exhibits anomalies. To do so, a network monitoring tool needs to: (1) identify the source application responsible for the traffic; (2) have per-application traffic profiles; and (3) detect deviations from profiles given a set of traffic samples. This work is mainly devoted to the last two topics. Upon an initial survey on traffic classification techniques, this research proposes the use of three different kinds of Gaussian Mixture Model (GMM) learning approaches, namely classic, an unsupervised and Universal Background Model (UBM) to build per-application profiles. The effectiveness of the proposed approaches has been demonstrated by conducting different sets of experiments on two public datasets collected from real networks, where the source of each traffic flow is provided. Experimental results indicate that while the proposed approaches are effective for most applications, the UBM approach significantly outperforms other approaches.
Slides

26. 19 February 2018 Fenia Aivaloglou How Kids Code and How We Know  
 

Abstract:
Programming education is nowadays beginning from as early as the elementary school age. Block-based programming languages like Scratch are increasingly popular, and there is substantial research showing that they are suitable for early programming education. However, how do children program, and are they really learning? In this talk we will focus on Scratch and explore the programs that children write. By scraping the Scratch public repository we have download and analyzed 250K projects in terms of complexity, used abstractions and programming concepts, and code smells. We will further discuss the effect of code smells to novice Scratch programmers and our experiments in Dutch classrooms. Focusing on how to teach good programming practices to children, we designed the edx MOOC "Scratch: Programmeren voor kinderen" and we analyzed the results of its first run, where more that 2K children actively participated. We found factors that can predict successful course completion, as well as age-related differences in the students' learning performance. To further examine the children's motivation and self-efficacy and their relation to learning performance we run a Scratch course in three schools and we will present our initial findings.
Slides

25. 30 January 2018 Greg Alpár Blockchain for identity management, seriously?  
  Abstract:
Right now (Dec. 11, 2017, 15:45) 1 Bitcoin is €14228.71. This amazing success of a cryptocurrency has made the blockchain technology, underlying that of Bitcoin, so exciting for businesses that it is often called the next big revolution after the internet. Blockchains are proposed to solve all kinds of problems in our digital world. Besides the financial context, identity is considered to be another promising area for blockchains. Is it a good idea? In this OUrsi session, I would like to share with you some thoughts with regard to the application of blockchains in identity management. Since there is no big results in our study yet, this talk will hopefully turn into a fruitful discussion.

24. 30 November 2017 Pekka Aho Automated Model Extraction and Testing of Graphical User Interface Applications  
  Abstract:
The industry practice for test automation through graphical user interface is script-based. In some tools the scripts can be recorded from the user (capture & replay tools) and in others, the scripts are written manually. The main challenge with script-based test automation is the maintenance effort required when the GUI changes. One attempt to reduce the maintenance effort has been model-based testing that aims to generate test cases from manually created models. When the system under testing (SUT) changes, changing the models and re-generating the test cases is supposed to reduce the maintenance effort. However, creating the formal models allowing test case generation requires very specialized expertise. Random testing (monkey testing) is a maintenance-free and scriptless approach to automate GUI testing. In random GUI testing, the tool emulates an end-user by randomly interacting with the GUI under testing by pressing buttons and menus and providing input to other types of GUI widgets. More recent research aims to automatically extract the GUI models by runtime analysis while automatically exploring and interacting with the GUI (or during random testing). Test case generation based on extracted models is problematic but by comparing extracted GUI models of consequent versions it is possible to detect changes on the GUI. This approach aims to address the maintenance problem with scriptless regression testing and change analysis.
Slides
23. 30 November 2017 Twan van Laarhoven Local network community detection: K-means clustering and conductance  
  Abstract:
Local network community detection is the task of finding a single community of nodes in a graph, concentrated around few given seed nodes. The aim is to perform this search in a localized way, without having to look at the entire graph. Conductance is a popular objective function used in many algorithms for local community detection. In this talk I will introduce a continuous relaxation of conductance. I will show that continuous optimization of this objective still leads to discrete communities. I investigate the relation of conductance with weighted kernel k-means for a single community, which leads to the introduction of a new objective function, σ-conductance. Conductance is obtained by setting σ to 0. I will present two algorithms for optimizing σ-conductance, which show promising results on benchmark datasets.
Slides
22. 27 June 2017 Jeroen Keiren An O(m log n) algorithm for stuttering equivalence  
  Abstract:
In this presentation, I will discuss a new algorithm to determine stuttering equivalence with time complexity O(m log n), where n is the number of states and m is the number of transitions of a Kripke structure. Theoretically, this algorithm substantially improves upon existing algorithms, which all have time complexity of the order O(mn) at best. Moreover, when compared with other algorithms, the new algorithm has better or equal space complexity. Practical results confirm these findings: they show that the algorithm can outperform existing algorithms by several orders of magnitude, especially when the Kripke structures are large.
Slides
21. 13 June 2017 Stef Joosten Software development in relation algebra with Ampersand  
  Abstract:
Relation algebra can be used as a programming language for building information systems. We will demonstrate this principle by discussing a database-application for legal reasoning. In this case study, we will focus on the mechanisms of programming in relation algebra. Besides being declarative, relation algebra comes with attractive promises for developing big software. The case study was compiled with Ampersand, a compiler whose design and development is the focus of a research collaboration between Ordina and the Open University.
Slides
20. 9 May 2017 Hieke Keuning Code Quality Issues in Student Programs  
  Abstract:
Because low quality code can cause serious problems in software systems, students learning to program should pay attention to code quality early. Although many studies have investigated mistakes that students make during programming, we do not know much about the quality of their code. We have examined the presence of quality issues related to program-flow, choice of programming constructs and functions, clarity of expressions, decomposition and modularization in a large set of student Java programs. We investigated which issues occur most frequently, if students are able to solve these issues over time and if the use of code analysis tools has an effect on issue occurrence. We found that students hardly fix issues, in particular issues related to modularization, and that the use of tooling does not have much effect on the occurrence of issues.
Slides
19. 4 April 2017 Greg Alpár The Alphabet of ABCs  
  Abstract:
The digital world becomes more and more a part of the physical world. Data, bound to our identity, relationships and communications, is processed all around the world. This may result in privacy harms, including unwanted advertising, data creep, exclusion, insecurity, etc. Although much of this data could possibly be used without linking personally to you, currently everybody seems to accept that we have to give up on the control over our privacy. A fundamental technological paradigm today is that you almost have to identify yourself in order to use a given digital service. What if we could avoid the need for this? For instance, you could just prove that you have a train ticket without a central party storing an account about you and about all your trips. Or you could merely show that you are over 18 of age without having to show (and let save) your name and credential number, so you are eligible to buy a bottle of 'jenever'. Or you could do online shopping with giving away as little personal information as you used to in a department store in the 70s. Attribute-based credentials (ABCs) provide a wide variety of possible scenarios, in which you can prove that you are entitled to do things that otherwise you can only do by having to use identifiers. In the last OUrsi meeting Fabian presented the IRMA technology, one of the most important practical ABC projects, and its applications. In this OUrsi meeting, I will talk about the neat cryptographic and mathematic techniques that allow for the above-mentioned high level of flexibility and practical privacy. (Don't get scared, you won't need lots of background in mathematics to understand my presentation!)
Slides
18. 7 March 2017 Rolando Trujillo
(University of Luxembourg)
Counteracting active attacks in social network graphs  
  Abstract:
Active privacy attacks in social networks attempt to de-anonymize a social graph by using social links previously established between the attackers and their victims. Malicious users are hard to identify a priori, so active attacks are typically prevented by anonymizing (perturbing) the social graph prior publication. This talk is an introduction to this type of privacy attack and the anonymization techniques that have been proposed to counteract it. Towards the end of the talk we will discuss drawbacks of current approaches and possible improvements.
Slides
17. 7 February 2017 Fabian van den Broek IRMA, as simple as ABC  
  Abstract:
I will present the IRMA technology. IRMA (I Reveal My Attributes) is an attribute-based credential system in which users can obtain credentials stating attributes about themselves (e.g.\ I'm an employee of the Open University, with employee-number 1234). These attributes can be disclosed selectively to verifiers (e.g. I'm an employee of the Open University, xxxxxxxxxxxxxxxxxxxxx), offering a very flexible and potentially privacy-friendly way of authentication. This presentation will mostly focus on the progress we made in the last two years and the progress we are hoping to achieve in the coming years.
Slides

16. 25 October 2016 Jeroen Keiren Game-based approach to branching bisimulation  
  Abstract:
Branching bisimilarity and branching bisimilarity with explicit divergences are typically used in process algebras with silent steps when relating implementations to specifications. When an implementation fails to conform to its specification, i.e., when both are not related by branching bisimilarity (with explicit divergence), pinpointing the root causes can be challenging. In this paper, we provide characterisations of branching bisimilarity (with explicit divergence) as games between Spoiler and Duplicator, offering an operational understanding of both relations. Moreover, we show how such games can be used to assist in diagnosing non-conformance between implementation and specification.
Slides
15. 4 October 2016 Arjen Hommersom Flexible Probabilistic Logic for Solving the Problem of Multimorbidity  
  Abstract:
One of the central questions in artificial intelligence is how to combine knowledge representation principles with machine learning methods. Effectively dealing with structure and relations in data opens up a wealth of applications in areas such as economics, biology, and healthcare. In this talk I will discuss a recent project proposal, which aims to develop a new methodology that provides powerful methods for deriving knowledge from structured data using state-of-the-art probabilistic logics. These methods will be applied to deal with a highly significant and challenging problem in modern medicine: managing patients with multimorbidity, i.e., patients that have multiple chronic conditions. The main purpose of this talk is to obtain feedback as this project will likely soon be presented to an evaluation committee in the exact sciences.
Slides
14. 28 June 2016 Hugo Jonker Gaming the H-index  
  Abstract:
In this talk, we discuss some early results looking into formally modelling the scientific process, and elucidating what attacks can be performed on bibliographic metrics. Specifically, we distinguish between hacking, fraud, and gaming, provide initial notions of these three terms and initial results towards identifying which gaming attacks are possible on the H-index.
13. 31 May 2016 Josje Lodder Strategies for axiomatic Hilbert-style proofs  
  Abstract:
Een van de onderwerpen die studenten tegenkomen bij een logicacursus is een afleidingssysteem zoals natuurlijke deductie of axiomatisch afleiden. Het vinden van afleidingen is niet altijd eenvoudig. Studenten moeten hier veel mee oefenen, en een e-learning systeem kan hier bij helpen. Voor natuurlijke deductie bestaan er verschillende ondersteunende tools, maar voor axiomatisch afleiden zijn er geen tools die studenten helpen bij het maken van deze afleidingen. Om gerichte feedback en feedforward te kunnen geven hebben we strategieën ontwikkeld waarmee we automatisch axiomatische bewijzen kunnen genereren, en we gaan deze strategieën ook gebruiken om uitwerkingen van studenten te volgen, zodat we op elk moment gericht feedback kunnen geven.
Slides
12. 26 April 2016 Bastiaan Heeren Ask-Elle: an Adaptable Programming Tutor for Haskell Giving Automated Feedback  
  Abstract:
Ask-Elle is a tutor for learning the higher-order, strongly-typed functional programming language Haskell. It supports the stepwise development of Haskell programs by verifying the correctness of incomplete programs, and by providing hints. Programming exercises are added to Ask-Elle by providing a task description for the exercise, one or more model solutions, and properties that a solution should satisfy. The properties and model solutions can be annotated with feedback messages, and the amount of flexibility that is allowed in student solutions can be adjusted. The main contribution of our work is the design of a tutor that combines (1) the incremental development of different solutions in various forms to a programming exercise with (2) automated feedback and (3) teacher-specified programming exercises, solutions, and properties. The main functionality is obtained by means of strategy-based model tracing and property-based testing. We have tested the feasibility of our approach in several experiments, in which we analyse both intermediate and final student solutions to programming exercises, amongst others.
Slides
11. 29 March 2016 Hugo Jonker MitM attacks evolved  
  Abstract:
The security community seems to be thoroughly familiar with man-in-the-middle attacks. However, the common perception of this type of attack is outdated. It originates from when network connections were fixed, not mobile, before 24/7 connectivity became ubiquitous. The common perception of this attack stems from an era before the vulnerability of the protocol's context was realised. Thanks to revelations by Snowden and by currently available man-in-the-middle tools focused on protocol meta-data (such as so-called `Stingrays' for cellphones), this view is no longer tenable. Security protocols that only protect the contents of their messages are insufficient. Contemporary security protocols must also take steps to protect their context: who is talking to whom, where is the sender located, etc. In short: the attacker has evolved. It's high time for our security models and requirements to catch up.
Slides
10. 26 February 2016 Tanja Vos TESTAR: Automated Testing at the User Interface Level  
  Abstract:
Testing applications at the User Interface level is an important yet expensive and labour-intensive activity. Several tools exist to automate UI level testing. These tools are based on capture replay or visual imagine recognition. We present TESTAR, a tool for automated GUI testing that takes a totally different approach and has demonstrated to be highly useful in practice. Using TESTAR, you can start testing immediately and you do not need to specify test cases in advance. TESTAR automatically generates and executes random test sequences based on a structure that is automatically derived from the UI through the accessibility API. Defects that can be found this way are crashes or suspicious outputs. For defects related to functionality, expected outputs need to be added to the tool. This can be done incrementally. The structure on which testing is based is build automatically during testing, the UI is not assumed to be fixed and tests will run even though the UI evolves, which will reduce the maintenance problem that threatens the approaches mentioned earlier.
Slides
9. 23 February 2016 Freek Verbeek Cross-layer invariants for Network-on-Chips  
  Abstract:
One of the major challenges in the design and verification of manycore systems is cache coherency. In bus-based architectures, this is a well-studied problem. When replacing the bus by a communication network, however, new problems arise. Cross-layer deadlocks can occur even when the protocol and the network are both deadlock-free when considered in isolation. To counter this problem, we propose a methodology for deriving cross-layer invariants for Network-on-Chips. These invariants relate the state of the protocols run by the cores to the state of the communication network. We show how they can be used to prove the absence of cross-layer deadlocks. Our approach is generally applicable and shows promising scalability.
Slides
8. 26 January 2016 Harald Vranken About a recently submitted research proposal
   

7. 27 October 2015 Bernard van Gastel Inschatten van energieverbruik van software door middel van een typesysteem  
  Abstract:
Energieverbruik speelt een steeds bepalendere rol binnen ICT. In veel toepassingsgebieden is het belangrijk om het energieverbruik te voorspellen, en ervoor te kunnen optimaliseren. Deze methoden moeten praktisch zijn: zowel snel feedback geven als voor grote programma/bibliotheken toepasbaar zijn. Omdat het energieverbruik sterkt afhangt van de hardware die gebruikt wordt, moet de gebruikte hardware snel in te stellen zijn en te wisselen zijn. In dit kader stellen wij een type systeem voor dat statisch gebruikt kan worden om energie verbruik compositioneel af te leiden. Dit type systeem werkt op een kleine theoretische programmeertaal waarbij hardware interacties expliciet gemaakt zijn.
Slides
6. 29 Sept 2015 Sung-Shik Jongmans Protocol Programming with Automata: (Why and) How?  
  Abstract:
With the advent of multicore processors, exploiting concurrency has become essential in writing scalable programs on commodity hardware. One important aspect of exploiting concurrency is programming protocols. In this talk, I explain my recent work on programming protocols based on a theory of automata.
Slides
5. 25 August 2015 Sylvia Stuurman Setup for a experiment on teaching refactoring  
  Slides
4. 30 June 2015 Jeroen Keiren State Space Explosion: Facing the Challenge  
  Abstract:
When processes run in parallel, the total number of states in a system grows exponentially; this is typically referred to as the state space explosion problem. The holy grail in model checking is finding a way in which we can effectively cope with this state space explosion. In this talk I discuss a line of research in which we use an equational fixed point logic and parity games as a basis for verification. I will focus on the latter, and discuss ideas that can be used to reduce these parity games. For those wondering whether this general theory has any applications at all I discuss industrial applications of this technology. If you want to know what these applications are, come listen to my talk!
Slides
3. 26 May 2015 Arjen Hommersom Probabilistic Models for Understanding Health Care Data  
  Abstract:
In the first part of this talk, I will give an overview of my recent research in using probabilistic models for health care applications. In the second part, I will zoom into recent results from the NWO CAREFUL project, where we attempt to reconcile probabilistic automata learning with Bayesian network modelling. In particular, the aim is to identify the probabilistic structure of states within a probabilistic automaton. We propose a new expectation-maximisation algorithm that exploits the state-based structure of these models. This model is currently being applied to learn models from psychiatry data to provide novel insights into the treatment of psychotic depressions. Preliminary results will be discussed.
Slides
2. 28 April 2015 Hugo Jonker FP-Block: usable web privacy by controlling browser fingerprinting  
  Abstract:
Online tracking of users is used for benign goals, such as detecting fraudulous logins, but also to invade user privacy. We posit that for non-oppressed users, tracking within one website does not have a substantial negative impact on privacy, while it enables legitimate benefits. In contrast, cross-domain tracking negatively impacts user privacy, while being of little benefit to the user. Existing methods to counter tracking treat any and all tracking similar: client-side storage is blocked, or all sites are fed random characteristics to hamper re-identification. We develop a tool, FP-Block, that counters cross-domain tracking, while allowing intra-domain tracking. For each visited site, FP-Block generates a unique, consistent fingerprint: a "web identity". This web identity is then used for any interaction with that site, including for third-party embedded content. This ensures that ubiquitously embedded parties will see different, unrelatable fingerprints from different sites, and can thus no longer track the user across the web.
Slides
1. March 31 2015 Christoph Bockisch A design method for modular energy-aware software  
  Abstract:
Nowadays achieving green software by reducing the overall energy consumption of the system is becoming more and more important. A well-known solution is to make the software energy-aware by extending its functionality with energy optimizers, which monitor the energy consumption of software and adapt it accordingly. Modular design of energy-aware software is necessary to make the extensions manageable and to cope with the complexity of the software. To this aim, we propose a model of the energy utilization -- or more general resource utilization -- and an accompanying design method for components in energy-intensive systems. Optimizer components can analyze such Resource Utilization Models (RUM) of other components in terms of such a model and adapt their behavior accordingly. We show how compact RUMs can be extracted from existing implementations. Using the Counterexample-Guided Abstraction Refinement (CEGAR) approach, along with model-checking tools, abstract models can be generated that help establish key properties relating to energy consumption. This approach is illustrated by the concrete example of a network manager sub-system.
Slides