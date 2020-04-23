An Excel PivotChart that captures data from AtScale’s COVID-19 Cloud OLAP model

A few weeks ago, I discussed the release of a daily updated dataset by Tableau, updated daily, with a simplified presentation of the COVID-19 global dataset from the Johns Hopkins Center for Science and Engineering (JHU). This was an important step in democratizing the data so that people could connect and analyze it in a self-serving way. I was a person and did some simple analysis on what I shared in the post.

Meanwhile, people are hungry for more. Data lovers and epidemic specialists want access to metrics beyond confirmed cases and deaths, as well as demographic data beyond the reach of COVID-19 itself. There’s a lot of public data out there, but tracking, cleaning, mixing, and modeling isn’t important. Now, several companies in the data space are working to address the pain points and make it easier to work with this wider range of data.

OLAP for COVID

Let’s start with AtScale, the San Mateo and Boston-based company, focused on OLAP through big data in the cloud. The company today announced its COVID-19 Cloud OLAP model, ready for a detailed analysis. AtScale hosts the model on its own platform and makes it available for consultation free of charge. Data sets include Starschema: COVID-19 Epidemiological data, which is available through Snowflake’s Exchange and data from Boston Childrens Hospital at COVIDNearYou.org. AtScale says its model is updated daily, as are source data sets.

To access the AtScale model, those interested can apply here. AtScale will respond with an email providing login information and login instructions. Attached to this email are Excel and Tableau worksheets developed based on the model (the one in Excel is shown at the top). Users can open these worksheets, connect their unique user ID and password, and begin cutting, trimming, and analyzing.

Databricks provides easy access to data and launches hackathon

Meanwhile, Databricks, the Spark-based platform, which serves as a workbench for data engineers and data scientists, is also adding value to the COVID-19 data scene. For starters, Databricks has added several COVID-19 datasets to be natively available on its platform (both Amazon web services and Microsoft Azure clouds). Specifically, developers can find the data in the “/ databricks-datasets / COVID /” folder built into the Databricks file system (DBFS), either in the payment service or in the free community edition. In other words, you want to use any database cluster and the COVID-19 data will be automatically in your file system. The company has also created copies of works that demonstrate how to open data and analyze it; details about the datasets and links to the notebooks are provided in a blog post by Denny Lee of Databricks.

In addition to data availability, and in coordination with the upcoming Databricks Spark + AI Summit virtual event, Databricks is launching a related hackathon under the banner “Data Teams Unite!” Teams entering the hackathon will be asked to focus on COVID-19, climate change or the challenges of their own communities (using open data resources made available by national, regional, state and local governments). Since the Databricks event is virtual this year and is free, the company expects a significant increase in attendance and hopes to be able to see a robust hackathon turnout. Teams of up to 4 people can enter the hackathon. Three finalist teams will be selected and Databricks will make direct donations to charities that choose the teams; The grand prize winner also receives free training and a ticket to a future Spark + AI event. The Hackathon starts today and tickets are on June 12th. The trial will take place between June 15 and 19.

Viewpoint and others

A number of other companies have their own offerings. For example, just yesterday, Looker, now part of Google Cloud, announced yesterday its COVID-19 data block, including LookML models, dashboards ready to run, and Looker “explores” (allowing ad hoc data to be cut and downloaded) . Looker’s offering uses the COVID-19 data your father made available, free of charge, on the BigQuery service (details here) and is offered in a hosted instance of Looker that is also free. Model data is based on JHU, the New York Times, the COVID Tracking Project, Definitive Healthcare, the Kaiser Family Foundation, and the Italian Department of Civil Protection.

Looker’s COVID-19 data block control panel

And there are more. Starschema and Snowflake have teamed up to provide a percentage of data previously loaded with COVID-19 related data (it is one of the data sources AtScale uses in its model). Participation is available to current Snowflake customers or those with trial accounts; request access here. Yellowbrick provides free access to its data warehouse service to help researchers and companies actively working on a vaccine for COVID-19 (details here). MariaDB provides non-profit healthcare, medical and academic support that combats free COVID-19 access to MariaDB SkySQL. HERE intelligence-focused technologies HERE Technologies offers its COVID-19 Coronavirus Tracking site. Isn’t that enough? Even more resources can be found in the data.world Coronavirus Data Resource Center (COVID-19).

There are many resources that go far beyond CSV files. Crisis-focused specialists have many options; which should help them get information faster (and possibly sound policies and effective protocols). And if you’re not a specialist and you feel at home because of the lockout, you need a project to focus on, maybe you can also take advantage of all these great COVID-19 data resources.

Updated April 22 at 12:40 pm ET to review the Databricks due date and trial period from May 29 and June 1-5, June 12 and June 15-19, respectively. June, respectively.