SW Developers - Databases

Author
Discussion

Ray Singh

Original Poster:

3,045 posts

229 months

Monday 23rd May 2016
quotequote all
My wife is looking after a research project for the NHS. She has collected a lot of data. MS Excel doesn’t seem to be able to cope with the vast amount of data now and we are keen to find a better tool to conduct statistical analysis.

Is Postgress or Mongo any better? Can a beginner on these learn how to use them pretty easily? Are the functions easy to understand as she is required to do lots of filtering, comparing etc?

They already have a tool as a db store that they use to keep this data, but this is limited in terms of functionality. Ideally they would have a new DB with the interegation tools built in.

Any help appreciated.

SlidingSideways

1,345 posts

231 months

Monday 23rd May 2016
quotequote all
If they're not very tech savvy, MS Access may be their best bet. It has a nice GUI front end, but you can write more complicated queries in SQL if required.

SystemParanoia

14,343 posts

197 months

Monday 23rd May 2016
quotequote all
Postgre and msaccess are rrelational databases.

Mongo is a nosql non relational database

And gundb is a graph database.

you'll have to read up for find whic one is best for her!

Also codecademy.com has a nice guide for learning mysql/postgre sql

0000

13,812 posts

190 months

Monday 23rd May 2016
quotequote all
Ray Singh said:
Is Postgress or Mongo any better?
Yes, massively so.

Ray Singh said:
Can a beginner on these learn how to use them pretty easily? Are the functions easy to understand as she is required to do lots of filtering, comparing etc?
I'd never say don't have a go... but it's probably not worth even trying. The NHS has budget for this sort of thing though, they should be able to get someone in?

nyt

1,803 posts

149 months

Monday 23rd May 2016
quotequote all
If your wife has Access as part of her MS Office package then it's quick and easy to import the excel file and see how she gets on: https://support.microsoft.com/en-us/kb/141228

If Access isn't up to the task then it's relatively easy to transfer the data from Access to SQL Server (a professional level Database). The free edition can be downloaded at: https://www.microsoft.com/en-gb/server-cloud/produ...

Instructions to move access DB to SQL Server: https://support.office.com/en-us/article/Move-Acce...


Ray Singh

Original Poster:

3,045 posts

229 months

Monday 23rd May 2016
quotequote all
Budget is the barrier. They really cannot afford to do anything and the current solution data base is shocking. We have to export the data to excel and then manipulate it here, but it will often crash.

I will look into researching what is best for her application.

R

nyt

1,803 posts

149 months

Monday 23rd May 2016
quotequote all
perhaps you could give us an idea of how much data and the problem that Excel is having with the analysis?

ETA: and the sort of filtering that you need.

Edited by nyt on Monday 23 May 12:40

eltawater

3,107 posts

178 months

Monday 23rd May 2016
quotequote all
I use Excel attached to a SQL Server Express database for large volume data analysis (mostly weblogs).

I've found that the native integration between Excel and SQL Server coupled with the simple to use but powerful charting tools in Excel works well for producing charts for reports.

The trick is to code stored procedures and views for common, repeatable operations required on the data e.g. ordered totals, trend analysis. You then get SQL Server to perform all the grunt work and only present the filtered down datasets back to Excel to perform your more detailed comparisons, otherwise you're asking Excel to ingest and handle volumes of data it doesn't really need to.

Typical things I tend to do are Web page requests volumes grouped by type and charted over hours, 95% percentile response times groupings, min / max / mean volumes over time etc. I don't ask Excel to calculate those, SQL Server does that and then just presents rows back to Excel with a timestamp + label value. These are then attached to a pivot table for min/max/etc manipulations.

Edited by eltawater on Monday 23 May 14:23

nyt

1,803 posts

149 months

Monday 23rd May 2016
quotequote all
eltawater said:
I use Excel attached to a SQL Server Express database for large volume data analysis (mostly weblogs).

I've found that the native integration between Excel and SQL Server coupled with the simple to use but powerful charting tools in Excel works well for producing charts for reports.

The trick is to code stored procedures and views for common, repeatable operations required on the data e.g. ordered totals, trend analysis. You then get SQL Server to perform all the grunt work and only present the filtered down datasets back to Excel to perform your more detailed comparisons, otherwise you're asking Excel to ingest and handle volumes of data it doesn't really need to.

Typical things I tend to do are Web page requests volumes grouped by type and charted over hours, 95% percentile response times groupings, min / max / mean volumes over time etc. I don't ask Excel to calculate those, SQL Server does that and then just presents rows back to Excel with a timestamp + label value. These are then attached to a pivot table for min/max/etc manipulations.

Edited by eltawater on Monday 23 May 14:23
OP - Sorry for the hijack

eltawater - Have you tried Splunk** for your analysis? It produces excellent results with little effort.
There's even a free version if your volumes are not too large.
Worth a look.

  • - Other, similar products exist

PurpleTurtle

6,940 posts

143 months

Monday 23rd May 2016
quotequote all
O/T but this is what irritates me about the NHS - the world's fifth largest employer, but still the people working in it have to get their husband to ask blokes on PH for free IT advice. WTF are they spunking our taxes on!?

Not having a go at you OP, it's just the need to ask that frustrates me ... she should have the tools at her disposal to do this. It's a NATIONAL Health Service, they should have NATIONAL IT provision, especially in general tools for data analysis.

Alas they don't, so you get things like my local hospital spunking £28 million (£18 million of which was an overspend) down the drain on this debacle because each Trust is left to do its own thing where IT is concerned. banghead



eltawater

3,107 posts

178 months

Monday 23rd May 2016
quotequote all
I'm aware of other teams using it for simpler applications but there have been a few bespoke requirements and constraints on my side which have meant an inhouse developed analysis solution has been more appropriate.

Vaud

50,289 posts

154 months

Monday 23rd May 2016
quotequote all
Ray Singh said:
Budget is the barrier. They really cannot afford to do anything and the current solution data base is shocking. We have to export the data to excel and then manipulate it here, but it will often crash.

I will look into researching what is best for her application.

R
If she does select a new tool, who is going to install, secure and manage it? Be careful with research data smile

That said, if you get on with Postgres, you can pick up Greenplum to run on virtual box (free) together with Mdlib (open source tools) at:

https://network.pivotal.io/products/pivotal-gpdb#/...


Munter

31,319 posts

240 months

Monday 23rd May 2016
quotequote all
PurpleTurtle said:
O/T but this is what irritates me about the NHS - the world's fifth largest employer, but still the people working in it have to get their husband to ask blokes on PH for free IT advice. WTF are they spunking our taxes on!?

Not having a go at you OP, it's just the need to ask that frustrates me ... she should have the tools at her disposal to do this. It's a NATIONAL Health Service, they should have NATIONAL IT provision, especially in general tools for data analysis.

Alas they don't, so you get things like my local hospital spunking £28 million (£18 million of which was an overspend) down the drain on this debacle because each Trust is left to do its own thing where IT is concerned. banghead
You'd think the NHS would have used GCloud to supply a PaaS database, that anybody and everybody in the NHS could simply make use of. But them the NHS has very little N about it at this level. Then the H has been put on hold until the S can be found as well.

Vaud

50,289 posts

154 months

Monday 23rd May 2016
quotequote all
Munter said:
You'd think the NHS would have used GCloud to supply a PaaS database, that anybody and everybody in the NHS could simply make use of. But them the NHS has very little N about it at this level. Then the H has been put on hold until the S can be found as well.
Indeed. There are a bunch of them in the GCloud 6.

Heck, a lot of the providers would give you a sandbox or temp licence for free if they could get a case study out of it...

SwissJonese

1,393 posts

174 months

Monday 23rd May 2016
quotequote all
Depends on what version of Excel you have and how much memory your PC has. If only 32bit Excel you quickly reach the 3GB memory limit, use 64bit Excel and it is up to how much RAM your PC has.

Look at Power Query (Get & Transform in Excel 2016), Power Pivot, Power View - all these are free add-on's to Excel and can easily handle millions or records (again down to available RAM (virtual ram) on the PC).

Worth a look at Power BI - free desktop version for doing very easy visualizations of data.

Vaud

50,289 posts

154 months

Monday 23rd May 2016
quotequote all
Free option: Google Spreadsheet has a limit of 400,000 cells.

How big is the data set?

marshalla

15,902 posts

200 months

Monday 23rd May 2016
quotequote all
Vaud said:
Free option: Google Spreadsheet has a limit of 400,000 cells.

How big is the data set?
Or LibreOffice will do 1,073,741,824 cells at 1024 columns by 1,048,576 rows and won't have as many DPA/info. sec. policy issues as using Google might for NHS data.


Vaud

50,289 posts

154 months

Monday 23rd May 2016
quotequote all
Another option:

Approach a local university with a computer science department and see if any 1st/2nd years want a small project to add to their CV... a savvy now will probably build what you need in a few hours.

Goaty Bill 2

3,393 posts

118 months

Monday 23rd May 2016
quotequote all
It's been mentioned in passing, but if there is any personal data in the data set(s), then security is of prime importance.

Companies / software that host or process your data, but won't take corporate responsibility when something goes wrong, are probably to be avoided.
I would imagine any internet/cloud solution would (should) require authorisation before use.


ETA
What database (if is it a database) is the "db store"?


Edited by Goaty Bill 2 on Monday 23 May 16:40

Efbe

9,251 posts

165 months

Monday 23rd May 2016
quotequote all
Ray Singh said:
My wife is looking after a research project for the NHS. She has collected a lot of data. MS Excel doesn’t seem to be able to cope with the vast amount of data now and we are keen to find a better tool to conduct statistical analysis.

Is Postgress or Mongo any better? Can a beginner on these learn how to use them pretty easily? Are the functions easy to understand as she is required to do lots of filtering, comparing etc?

They already have a tool as a db store that they use to keep this data, but this is limited in terms of functionality. Ideally they would have a new DB with the interegation tools built in.

Any help appreciated.
can tell there aren't any developers actually answering this, because...

how much data do you have?
What is too big?