DataFrame Operations: A Framework-Agnostic Library

Dataframe operations form the foundation of modern data science workflows, yet researchers often face vendor lock-in when choosing between different dataframe frameworks. My work addresses this by developing a unified operations library that abstracts away framework-specific implementations. The challenge lies in handling the fundamental differences between these systems—some pandas-like dataframes use row-by-row operations, others employ lazy evaluation with native expressions, some require distributed execution patterns, while others focus on parallelization.

I’ve implemented operations like merge, reduce, replace, forward-fill, and mathematical transformations that work identically across all frameworks. Extensive benchmarking infrastructure—with synthetic datasets at small, medium, and large scales—ensures performance optimization while maintaining consistent behavior.

DataFrames

Scientific Topics of Interest

- 2 mins read

Series: Science

Intrinsically Disordered Proteins (IDPs)

Most knowledge about proteins concerns structured proteins with specific functions. However, most proteins don’t have a particular or unique structure, and this doesn’t imply a lack of function. This makes IDPs incredibly interesting. The formation of protein-protein interactions underlies how proteins evolve to perform different functions.

IDPs