Use Danfo.js to Manipulate Data In JavaScript Like a Pandas Pro

Iris S
4 min readJul 3, 2021

--

Inspiration

For my personal project, I created a web application for tracking algo practice performance. Algo Tracker. It has three major features

  1. Track daily pass/fail distribution
  2. Error Analysis
  3. Performance Benchmarking

And each one of them involves lengthy & tedious data aggregation & wrangling exercises. Try to imagine aggregating data nested in 4 levels…. It is not uncommon especially when we’re creating visualization for example: tree map / stacked bar/ area chart … all the common ones we see in corporate presentations.

If you’re doing any modeling work… you know the freakin deal… cuz all the preprocessing works are… nuts.

Therefore, a better solution for data manipulation in javascript is needed, and today we are walking through Danfo.js, by comparing it to Pandas side-by-side to get the basics of it.

Some basic terminologies before we get started:

Data aggregation / flattening

  • Transforming the structure of a data table from A > B

What is Pandas?

  • According to the official development team, pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

Dataframe

  • A pandas representation of “table”

Why should we use pandas?

  • When any sort of data manipulation (gymnastic) is needed

How

  • Where to use (visualize in jupyter, IDE)

What is Danfo.js

  • Danfo.js is an open-source, JavaScript library providing high-performance, intuitive, and easy-to-use data structures for manipulating and processing structured data.
  • Danfo.js is heavily inspired by the Pandas library and provides a similar interface and API. This means users familiar with the Pandas API can easily use Danfo.js.

Why

  • There are other replacements such as D3.js, but it doesn’t have the dataframe feature. Danfo is developed by the tensorflow team. Users want to build ML models will benefit from this seamless transition.
  • Trying to manipulate dataset sets in during web development

Similarity

  • The concept of dataframe
  • Syntax — very much the same

Difference

  • Danfo is node & DOM only

Upon getting the basic understanding of Pandas and Danfo.js, lets start by comparing the syntax side-by-side

Reading CSV files — Pandas

df=pd.read_csv(‘https://storage.googleapis.com/daily_practice_csv/practiceData.csv')
df.head()
Output: Pandas Dataframe

Reading CSV files — Danfo.js

const dfd = require(“danfojs-node”)dfd.read_csv(“practiceData.csv”).then(async(df) => {df[‘qTechnique’].head().print()}).catch(err => {console.log(err);})
Output: Danfo.js Dataframe

Columns Operations

Pandas — Selecting Columns

df[‘date’].head()

Danfo.js — Selecting Columns

const dfd = require(“danfojs-node”)dfd.read_csv(“practiceData.csv”).then(async(df) => {df[‘qTechnique’].head().print()}).catch(err => {console.log(err);})

Adding Columns — Pandas

df[‘date_dt’]= df[‘date’].astype(‘datetime64[ns, US/Eastern]’)df.head()

Adding Columns — Danfo

dfd.read_csv(“practiceData.csv”).then(async(df) => {let df2 = await df.groupby([‘userId’, ‘date’]).agg({“qTechnique”:”count”, “pass”:”sum”})let qc = df2[‘qTechnique_count’]let ps = df2[‘pass_sum’]df2.addColumn({‘column’:’fail_numm’, “value”:qc.sub(ps)})df2.head().print()}).catch(err => {console.log(err);})

Performing Row Operations

Filter by value — Pandas

df.loc[df[“qType”]==’Array’ , [“date”,”qType”, “qTechnique”,”pass”]].head()

Danfo

Adding rows — Pandas

df.append({‘date’:’2021–06–01',‘userId’:’xmiris.shi@gmail.com’,‘qType’:’DP’,‘qTechnique’:’Pointers’,‘difficulty’:’hard’ },ignore_index=True).tail()

Single Level Aggregations — Pandas

df.groupby(‘userId’).mean()

Single Column Aggregations — Danfo

dfd.read_csv(“practiceData.csv”).then(async(df) => {df.groupby([‘userId’]).agg({“pass”:”mean”}).print()}).catch(err => {console.log(err);})

Multi-column Aggregation: Pandas

df.groupby([‘userId’,’qType’, ‘qTechnique’]).mean()

Multi-column Aggregation: Danfo

dfd.read_csv(“practiceData.csv”).then(async(df) => {df.groupby([‘userId’, ‘qType’, ‘qTechnique’]).agg({ “pass”:”mean”}).print()}).catch(err => {console.log(err);})

--

--

No responses yet