SW Developers - Databases
Discussion
My wife is looking after a research project for the NHS. She has collected a lot of data. MS Excel doesn’t seem to be able to cope with the vast amount of data now and we are keen to find a better tool to conduct statistical analysis.
Is Postgress or Mongo any better? Can a beginner on these learn how to use them pretty easily? Are the functions easy to understand as she is required to do lots of filtering, comparing etc?
They already have a tool as a db store that they use to keep this data, but this is limited in terms of functionality. Ideally they would have a new DB with the interegation tools built in.
Any help appreciated.
Is Postgress or Mongo any better? Can a beginner on these learn how to use them pretty easily? Are the functions easy to understand as she is required to do lots of filtering, comparing etc?
They already have a tool as a db store that they use to keep this data, but this is limited in terms of functionality. Ideally they would have a new DB with the interegation tools built in.
Any help appreciated.
Ray Singh said:
Is Postgress or Mongo any better?
Yes, massively so.Ray Singh said:
Can a beginner on these learn how to use them pretty easily? Are the functions easy to understand as she is required to do lots of filtering, comparing etc?
I'd never say don't have a go... but it's probably not worth even trying. The NHS has budget for this sort of thing though, they should be able to get someone in?If your wife has Access as part of her MS Office package then it's quick and easy to import the excel file and see how she gets on: https://support.microsoft.com/en-us/kb/141228
If Access isn't up to the task then it's relatively easy to transfer the data from Access to SQL Server (a professional level Database). The free edition can be downloaded at: https://www.microsoft.com/en-gb/server-cloud/produ...
Instructions to move access DB to SQL Server: https://support.office.com/en-us/article/Move-Acce...
If Access isn't up to the task then it's relatively easy to transfer the data from Access to SQL Server (a professional level Database). The free edition can be downloaded at: https://www.microsoft.com/en-gb/server-cloud/produ...
Instructions to move access DB to SQL Server: https://support.office.com/en-us/article/Move-Acce...
I use Excel attached to a SQL Server Express database for large volume data analysis (mostly weblogs).
I've found that the native integration between Excel and SQL Server coupled with the simple to use but powerful charting tools in Excel works well for producing charts for reports.
The trick is to code stored procedures and views for common, repeatable operations required on the data e.g. ordered totals, trend analysis. You then get SQL Server to perform all the grunt work and only present the filtered down datasets back to Excel to perform your more detailed comparisons, otherwise you're asking Excel to ingest and handle volumes of data it doesn't really need to.
Typical things I tend to do are Web page requests volumes grouped by type and charted over hours, 95% percentile response times groupings, min / max / mean volumes over time etc. I don't ask Excel to calculate those, SQL Server does that and then just presents rows back to Excel with a timestamp + label value. These are then attached to a pivot table for min/max/etc manipulations.
I've found that the native integration between Excel and SQL Server coupled with the simple to use but powerful charting tools in Excel works well for producing charts for reports.
The trick is to code stored procedures and views for common, repeatable operations required on the data e.g. ordered totals, trend analysis. You then get SQL Server to perform all the grunt work and only present the filtered down datasets back to Excel to perform your more detailed comparisons, otherwise you're asking Excel to ingest and handle volumes of data it doesn't really need to.
Typical things I tend to do are Web page requests volumes grouped by type and charted over hours, 95% percentile response times groupings, min / max / mean volumes over time etc. I don't ask Excel to calculate those, SQL Server does that and then just presents rows back to Excel with a timestamp + label value. These are then attached to a pivot table for min/max/etc manipulations.
Edited by eltawater on Monday 23 May 14:23
eltawater said:
I use Excel attached to a SQL Server Express database for large volume data analysis (mostly weblogs).
I've found that the native integration between Excel and SQL Server coupled with the simple to use but powerful charting tools in Excel works well for producing charts for reports.
The trick is to code stored procedures and views for common, repeatable operations required on the data e.g. ordered totals, trend analysis. You then get SQL Server to perform all the grunt work and only present the filtered down datasets back to Excel to perform your more detailed comparisons, otherwise you're asking Excel to ingest and handle volumes of data it doesn't really need to.
Typical things I tend to do are Web page requests volumes grouped by type and charted over hours, 95% percentile response times groupings, min / max / mean volumes over time etc. I don't ask Excel to calculate those, SQL Server does that and then just presents rows back to Excel with a timestamp + label value. These are then attached to a pivot table for min/max/etc manipulations.
OP - Sorry for the hijackI've found that the native integration between Excel and SQL Server coupled with the simple to use but powerful charting tools in Excel works well for producing charts for reports.
The trick is to code stored procedures and views for common, repeatable operations required on the data e.g. ordered totals, trend analysis. You then get SQL Server to perform all the grunt work and only present the filtered down datasets back to Excel to perform your more detailed comparisons, otherwise you're asking Excel to ingest and handle volumes of data it doesn't really need to.
Typical things I tend to do are Web page requests volumes grouped by type and charted over hours, 95% percentile response times groupings, min / max / mean volumes over time etc. I don't ask Excel to calculate those, SQL Server does that and then just presents rows back to Excel with a timestamp + label value. These are then attached to a pivot table for min/max/etc manipulations.
Edited by eltawater on Monday 23 May 14:23
eltawater - Have you tried Splunk** for your analysis? It produces excellent results with little effort.
There's even a free version if your volumes are not too large.
Worth a look.
- - Other, similar products exist
O/T but this is what irritates me about the NHS - the world's fifth largest employer, but still the people working in it have to get their husband to ask blokes on PH for free IT advice. WTF are they spunking our taxes on!?
Not having a go at you OP, it's just the need to ask that frustrates me ... she should have the tools at her disposal to do this. It's a NATIONAL Health Service, they should have NATIONAL IT provision, especially in general tools for data analysis.
Alas they don't, so you get things like my local hospital spunking £28 million (£18 million of which was an overspend) down the drain on this debacle because each Trust is left to do its own thing where IT is concerned.
Not having a go at you OP, it's just the need to ask that frustrates me ... she should have the tools at her disposal to do this. It's a NATIONAL Health Service, they should have NATIONAL IT provision, especially in general tools for data analysis.
Alas they don't, so you get things like my local hospital spunking £28 million (£18 million of which was an overspend) down the drain on this debacle because each Trust is left to do its own thing where IT is concerned.
Ray Singh said:
Budget is the barrier. They really cannot afford to do anything and the current solution data base is shocking. We have to export the data to excel and then manipulate it here, but it will often crash.
I will look into researching what is best for her application.
R
If she does select a new tool, who is going to install, secure and manage it? Be careful with research data I will look into researching what is best for her application.
R
That said, if you get on with Postgres, you can pick up Greenplum to run on virtual box (free) together with Mdlib (open source tools) at:
https://network.pivotal.io/products/pivotal-gpdb#/...
PurpleTurtle said:
O/T but this is what irritates me about the NHS - the world's fifth largest employer, but still the people working in it have to get their husband to ask blokes on PH for free IT advice. WTF are they spunking our taxes on!?
Not having a go at you OP, it's just the need to ask that frustrates me ... she should have the tools at her disposal to do this. It's a NATIONAL Health Service, they should have NATIONAL IT provision, especially in general tools for data analysis.
Alas they don't, so you get things like my local hospital spunking £28 million (£18 million of which was an overspend) down the drain on this debacle because each Trust is left to do its own thing where IT is concerned.
You'd think the NHS would have used GCloud to supply a PaaS database, that anybody and everybody in the NHS could simply make use of. But them the NHS has very little N about it at this level. Then the H has been put on hold until the S can be found as well.Not having a go at you OP, it's just the need to ask that frustrates me ... she should have the tools at her disposal to do this. It's a NATIONAL Health Service, they should have NATIONAL IT provision, especially in general tools for data analysis.
Alas they don't, so you get things like my local hospital spunking £28 million (£18 million of which was an overspend) down the drain on this debacle because each Trust is left to do its own thing where IT is concerned.
Munter said:
You'd think the NHS would have used GCloud to supply a PaaS database, that anybody and everybody in the NHS could simply make use of. But them the NHS has very little N about it at this level. Then the H has been put on hold until the S can be found as well.
Indeed. There are a bunch of them in the GCloud 6.Heck, a lot of the providers would give you a sandbox or temp licence for free if they could get a case study out of it...
Depends on what version of Excel you have and how much memory your PC has. If only 32bit Excel you quickly reach the 3GB memory limit, use 64bit Excel and it is up to how much RAM your PC has.
Look at Power Query (Get & Transform in Excel 2016), Power Pivot, Power View - all these are free add-on's to Excel and can easily handle millions or records (again down to available RAM (virtual ram) on the PC).
Worth a look at Power BI - free desktop version for doing very easy visualizations of data.
Look at Power Query (Get & Transform in Excel 2016), Power Pivot, Power View - all these are free add-on's to Excel and can easily handle millions or records (again down to available RAM (virtual ram) on the PC).
Worth a look at Power BI - free desktop version for doing very easy visualizations of data.
It's been mentioned in passing, but if there is any personal data in the data set(s), then security is of prime importance.
Companies / software that host or process your data, but won't take corporate responsibility when something goes wrong, are probably to be avoided.
I would imagine any internet/cloud solution would (should) require authorisation before use.
ETA
What database (if is it a database) is the "db store"?
Companies / software that host or process your data, but won't take corporate responsibility when something goes wrong, are probably to be avoided.
I would imagine any internet/cloud solution would (should) require authorisation before use.
ETA
What database (if is it a database) is the "db store"?
Edited by Goaty Bill 2 on Monday 23 May 16:40
Ray Singh said:
My wife is looking after a research project for the NHS. She has collected a lot of data. MS Excel doesn’t seem to be able to cope with the vast amount of data now and we are keen to find a better tool to conduct statistical analysis.
Is Postgress or Mongo any better? Can a beginner on these learn how to use them pretty easily? Are the functions easy to understand as she is required to do lots of filtering, comparing etc?
They already have a tool as a db store that they use to keep this data, but this is limited in terms of functionality. Ideally they would have a new DB with the interegation tools built in.
Any help appreciated.
can tell there aren't any developers actually answering this, because...Is Postgress or Mongo any better? Can a beginner on these learn how to use them pretty easily? Are the functions easy to understand as she is required to do lots of filtering, comparing etc?
They already have a tool as a db store that they use to keep this data, but this is limited in terms of functionality. Ideally they would have a new DB with the interegation tools built in.
Any help appreciated.
how much data do you have?
What is too big?
Gassing Station | Computers, Gadgets & Stuff | Top of Page | What's New | My Stuff