How can we improve knowledge distillation from large unstructured data?
VPNs are increasingly promoted as a privacy-enhancing technology and a solution to protecting users’ privacy from surveillance and cyber attacks.1 While most VPN protocols encrypt users’ browsing traffic, the research community has repeatedly demonstrated that encryption algorithms need not be broken for malicious agents with knowledge of users’ encrypted traffic to fingerprint the websites they visit. We investigate whether advancements in VPN technologies in the last two decades make VPNs harder to fingerprint.
In this study, we investigate the capabilities and inherent biases of advanced large language models (LLMs) such as GPT-3.5 and GPT-4 in the context of debate evaluation.
This practitioner paper describes a new, multi-locality benchmark program for testing memory access latency and using it to study recent AMD machines equipped with 3D vertical cache(V-Cache) that can be over 1 GiB in total size on a single node.
In this paper, we introduce the causal data lake discovery problem and propose a large language model(LLM)-based framework to discover potential pairwise causal links between columns from different tables.
This thesis delves into MO-MAB algorithms, examining their broad applications and potential to enhance HPO.
Archive
We present TSUBASA, an algorithm for efficiently computing the exact pair-wise time-series correlation based on Pearson’s correlation.
Tracking Words
We present findings of a project that studies word meaning changes over time.
An alternative solution design for Laser-Plasma Simulation Environment (LPSE).
This study proposes a semi-supervised method for obtaining arousal-valence annotations of a speech corpus when only discrete emotion category information is available.
Sampling Over Union of Joins
To avoid the cost of join and union, given a set of joins, we study the problem of obtaining a random sample from the union of joins without performing the full join and union.
CBET Simulation on GPUs
Cross Beam Energy Transfer (CBET) is a simulation that models how intersecting laser beams share energy.
A translation validation tool for a small compiler backend called QBE. The tool can identify invalid intermediate language transformations and optimizations.
Augmented Transactional Memory
Multithreading introduces complications around synchronizing different threads with shared memory spaces.
A buffered variant of the Pronto persistence framework that provides a 1.26x to x6.77x times performance increase with potential data loss contained within 1ms.
Verifying Lake Ontario’s Water Level
The Caldwell-Fay equation (2002) attempts to model what Lake Ontario’s current water level would be if dam construction had never taken place along the St. Lawrence Seaway (i.e. the natural hydraulic state of the lake).
Newly unearthed Lake Ontario data going back to the 1860s has been discovered, and we had the rare opportunity to be the first to digitize and publicly analyze it.
Since this data set predates any dam construction it actually captures the lake’s natural state. Therefore it can be used to verify Caldwell-Fey’s equation which is being used to govern the lake’s inflow and outflow rate on a daily basis.
Building a Web Application from Scratch
This semester I built an e-commerce application to sell my artwork and donate the profits to charity. I built a Node.js server using the Express framework and middleware to implement authentication, session management, and security features. The client side code uses the Jade templating language to pass run-time variables into static files which are rendered into HTML and passed to the browser. I designed a relational database to store the relevant information and built it out using MySQL.
Hyperion is a 3D visualization platform for optical design. It provides a fully immersive, and interactive 3D user experience. It enables the visualization of models of folded freeform optical systems. The frontend user experience is supported by the computational ray-tracing engine of Eikonal+, an optical design research software. We have built a cross-platform light-weight version of Eikonal+ that can communicate with any user interface. We have also demonstrated a prototype of the 3D user experience using a Hololens AR display.
Memcached is a widely used key-value (KV) store. It is structured
as a multithreaded user-level server, accessed over socket
connections by a potentially distributed collection of clients.
Because socket communication is so much more expensive
than a single operation on a KV store, much of the client
library is devoted to batching of requests. Batching is not
always feasible, however, and the cost of communication
seems particularly unfortunate when—as is often the case—
clients are co-located on a single machine with the server,
and have access to the same physical memory.
Fortunately, recent work on protected libraries has shown
that it is possible, on current Intel processors, to amplify access
rights quickly when calling into a specially configured
user-level library. Library instances in separate processes
can then share data safely, even in the face of independent
process failures. We have used protected libraries to implement
a new version of memcached in which client threads
execute the code of the server themselves, without the need
to send messages. Compared to the original, our new version
is both significantly simpler, containing 24% less code, and
dramatically faster, with a 11–56× reduction in latency and
a roughly 2× increase in throughput.